Entrez Gene Database: Instructions

There are three scripts that are used to create the database's schema and populate it with data.

First, run the SQL script.

On the command line type:

mysql -h host -u username -p password database < tables_for_entrezgene.sql

replacing host, username, password, and database with real values. This will create the database and a set of empty tables of the desired structure.

Second, run the Perl script.

Make some changes before running this Perl script: Change the following variables at the top of the script:

$downloadDir - location where files will be downloaded
$SQL - location of parsed data files, ready for database import

To run the Perl script, on the command line type:

./download_parse_entrezgene.pl

Third, run the shell script.

This script loads the data into the database.

Using a text editor, change the following variables at the top of the script (to the same ones you used for the initial MySQL command):

$host - host machine name
$db - database name
$user - MySQL username
$pw - MySQL password
$SQL - location of parsed data files, ready for database import

If you would like process the refSeqSummary data from UCSC and load it into the database, download EntrezGene_refSeqSummary.zip, which contains instructions and code for this step.

Please send email to wibr-bioinformatics --- AT --- wi.mit.edu for more information.

Instructions for creating a local MySQL version of NCBI's Entrez Gene database