SEQUENCE ANALYSIS UNIX COMMANDS - Lecture 3

WIBR Sequence Analysis Course 2005

Getting genome sequence

  1. Your hebrides account can't hold this much data, so only do this if you want the genome on your own computer.
  2. The UCSC FTP site has the same hierarchy as UCSC Genome Bioinformatics downloads
  3. FTP to a repository like UCSC
  4. unzip chromFa.zip (or whatever file you want to unzip)

Mapping and extracting genomic sequence

  1. First: format the genome using the faToNib command.
  2. To BLAT search the genome with a multiple sequence fasta file,
  3. Extract a region of a chromosome using the nibFrag command.
  4. See Using BLAT on hebrides for all the details of these commands.
  5. There is a command on hebrides called 'blat' but it only works to map one sequence to another but not to the entire genome.

Downloading the Ensembl annotation

  1. FTP to Ensembl
  2. gunzip hsapiens_ensemblgene_main.txt.table.gz (or whatever .gz file you want to gunzip)
  3. Since the file is tab-delimited, it can be parsed with Perl
  4. The file can also be imported into a MySQL database

Querying the Ensembl database

  1. Connect: mysql -u anonymous -h kaka.sanger.ac.uk (to get a MySQL prompt)
  2. show databases;
  3. use ensembl_mart_19_1; (or the database you want)
  4. show tables;
  5. describe hsapiens_ensemblgene_main; (or the table you want)
  6. show tables;
  7. enter a query:

WIBR Sequence Analysis Course 2005