HOMEWORK 4

Click here to learn how to set up your fladda account.
Click here to learn how to use X windows on fladda.
After logging in to your fladda account
- What's the full path of your home directory?
- What files are in your home directory?
- Are there any hidden files?
In this question, you will make a database, then blast a given sequences against this database, finally retrieve conserved regions from these blast hits.
- Get a list of sequences in FASTA format. Copy the list of GI numbers and save it as a text file on your computer. At the NCBI Batch Entrez website, browse to choose it from your system directory, designate the database as Protein, press Retrieve; you will see a list of document summaries. From the Display line, select FASTA and click Text to get all the sequences in FASTA format. Copy and save the sequences in a text file on your computer and move the file to your home directory on fladda. You can transfer the file with ftp programs (ftp Instruction).
- Format the above sequence file to create blast searchable files. Type the command formatdb -p T -o T -i SeqFileName in your home directory, where SeqFileName is the name of the above sequence file you saved on fladda. You should see 8 files appear in your home directory including a formatdb.log file. Simply type formatdb at fladda prompt to figure out why we use above command line.
- There are two sequences in the file called BlastIN. One is a drosophila olfactory receptor sequence; the other is a mouse olfactory receptor sequence. You are asked to find similar proteins in the above database for these two sequences. The full path for the BlastIN is /usr/people/yuan/wi_homework/hw4/BlastIN. Copy this file to your home directory.
- Do blast search with the blastall command. You can get the instruction on how to use blastall by simply typing blastall at the command line. The program to use is blastp, and print hits where e < 0.01.
  - Run once to create an output file (2seqs.blast.txt) in text format
  - Run once to create an output file (2seqs.blast.html) in html format
- Check the output files (with 'more'). View 2seqs.blast.html in an internet browser (either Netscape on fladda using X Windows or download the file to your desktop and view in your favorite browser). Find out how many sequences are similar to your first query sequence, and which species these sequences are from. How about for the second query sequence?
- Extract the target protein sequences from the first query with fastacmd command. There are two arguments you need to use. One is -d SeqFileName. Make sure you give the correct path of SeqFileName. Another Argument is -s accession_number. Save each output to a file, and combine the files into one multiple sequences file with cat command.
- Retrieve certain sequence regions from the above multiple sequences file. Olfactory receptors have several transmembrane domains. If you do multiple alignment of all the sequences in the above multiple sequences file, you will find conserved domain sequences in the alignment. You will learn how to do multiple alignments later in the course. One of the transmembrane domains has the conserved sequence pattern TIFVQF. Use pico (Instruction on how to use pico) or your favorate unix editor to save the pattern in a file. You can retrieve the conserved sequences from your multiple sequences file with fuzzpro command. The fuzzpro is one of program downloaded to fladda from emboss. Type fuzzpro on your command line and following the prompts. Permit one mismatch. What's your result?
Create a directory called bioinfo_course, and inside it create another directory homework_4. What commands do you need to issue? Move all the files you made previously inside the homework_4 directory. What command could you use to do this?