Creating a local MySQL version of NCBI's Entrez Gene database

Entrez Gene is NCBI's repository for gene-specific information. Access to this information either through the Entrez Gene website or by flat files via NCBI's ftp site can be time consuming and limiting in regards to the number of and what questions you can ask about the data. A better solution for intense data mining is to create a relational database.

We offer our MySQL based database and data parsing/loading scripts as an easy-to-implement solution to this problem. While the ER diagram describes the database we created, we also offer the SQL syntax for both the tables and indexes. The scripts will automatically download Entrez Gene data files, parse them, and load them into a MySQL database.

Requirements
UnixIncluding wget, tar and gzip
PerlBasic installation; no special modules needed.
MySQL Free; see mysql.com to download.


Files to Download
ER diagramSimple diagram of the database. Note that mim2gene has been removed since it is now hosted by omim.org.
InstructionsHpw to use the different scripts for downloading, parsing, and loading.
Tables and indexesSQL script to create the database, and its tables and indexes.
Downloading and parsing scriptPerl script to download, uncompress, and parse data files. Enter file parameters at top of script.
Script to load parsed filesShell script to load new data into database. Enter file parameters at top of script.
Entrez Gene sample queriesSome example queries you can do with this database.


Descriptions of The Tables
For detailed description of each table and the data within. See NCBI's readme file for Gene README