Commands
Before starting, index the genome by going to the directory of nib
files and issuing the gfServer command:
cd /path/to/nib/dir
gfServer start
fladda.wi.mit.edu portNum *.nib
where portNum is some 4-5 digit number greater than 1024. The gf
(genomic finding) indexing performed with the gfServer command usually
takes on the order of 15 minutes. You can monitor progress through
chromosomes as tiles are counted and then added. When the process is
complete, you'll get the message
Done adding
Server ready for queries!
To run blat, use the gfClient command:
gfClient [-out=pslx, etc.][-nohead]
fladda.wi.mit.edu portNum /full/path/to/nib/dir seqFileToBlat
outFile
where portNum is the same as you used with the gfServer command.
When you're finished with all of your BLATing, stop the gfServer:
gfServer stop fladda.wi.mit.edu
portNum
where portNum is the same as you used to start the gfServer. To analyze
multiple sequences with BLAT, use a multiple sequence file as input and
you'll get one big output file.
To get help with command syntax and options, run one of these commands
alone.
BLAT output
Sample
BLAT output for three input sequences shows the default output format.
This is slightly different from the format of the web version of BLAT. One
key point is that command line blat doesn't prioritize hits from best to
worst. Web BLAT does this by ordering by "SCORE", which is calculated as
SCORE = matches - mismatches. In other words, you need to sort any
multiple-hit results to find the best one, which isn't necessarily the
first. Another key point is that there may be no obvious "best" hit:
several alignments may produce similar scores, and one needs to decide how
many of these hits (if any) are biologically meaningful. The output is
tab-delimited, so that may help for import into another application for
sorting. With the option '-nohead', as one might predict, the 5-line
header is not printed. Other output options:
pslx - Tab separated format with sequence
axt - blastz-associated axt format
maf - multiz-associated maf format
wublast - similar to wublast format
blast - similar to NCBI blast format
The fields for the default output are:
Extracting genomic sequence
Once one has mapped a sequence to the genome, adjacent sequence can be
easily extracted using the nibFrag command:
nibFrag chrFile.nib startNT endNT strand
outFile
nibFrag doesn't require any indexes in memory (like BLAT), and it's
much faster than EMBOSS's extractseq. It only works, however, on
nib-formatted sequence files. For the time being, nibFrag is found in
/usr/people/gbell/bin.
Software credits
Blat and associated genomic finding software is courtesy of Jim Kent - see "Source Code" or
"Executables."
This page last updated on