Instructions for creating a local MySQL version of NCBI's Entrez Gene database

These instructions assume you have access to a MySQL account and database with the proper create and insert permissions. If you are downloading MySQL onto your own local machine and need help with creating accounts and databases, please see MySQL's free online documentation.

There are three scripts that are used to create the database's schema and populate it with data.

First, run the SQL script.

On the command line type:

mysql -h host -u username -p password database < tables_for_entrezgene.sql

replacing host, username, password, and database with real values. This will create the database and a set of empty tables of the desired structure.


Second, run the Perl script.

Make some changes before running this Perl script: Change the following variables at the top of the script:

$downloadDir - location where files will be downloaded
$SQL - location of parsed data files, ready for database import

To run the Perl script, on the command line type:

./download_parse_entrezgene.pl


Third, run the shell script.

This script loads the data into the database.

Using a text editor, change the following variables at the top of the script (to the same ones you used for the initial MySQL command):

$host - host machine name
$db - database name
$user - MySQL username
$pw - MySQL password
$SQL - location of parsed data files, ready for database import


If you would like process the refSeqSummary data from UCSC and load it into the database, download EntrezGene_refSeqSummary.zip, which contains instructions and code for this step.

Please send email to wibr-bioinformatics --- AT --- wi.mit.edu for more information.