a. Search for coding regions by GenomeScan with your masked genomic sequence. By running the BLAST or/and Genscan, you can find protein sequences required by GenomeScan. Directions for finding protein sequences are in the fourth paragraph of the GenomeScan website. For BLAST search, you can search against "swissprot" database. From the BLAST result, choose the human hits with e-value of 0.
b. Search for coding regions by MZEF with your masked genomic sequence. Compare the locations of the predicted exons by MZEF with those from GenomeScan.
c. To check the performance of the GenomeScan, compare the predicted sequence found from GenomeScan with the sequence found by experiment. You can do the pairwise alignment with NCBI BLAST 2 program.
a. Run PSI-BLAST with the drosophila olfactory receptor 85e (accession number is NP_524283). You can put the accession number into the "search" field, and only search the odorant receptors in "Drosophila melanogaster". How many hits with significant alignments are in your results? Which matrix is used by default?
b. Run PSI-BLAST until it converges with default matrix, and limiting the search inside Drosophila melanogaster. For each iteration, only includes the sequences belonging to "odorant receptor". Because olfactory receptors share very low similarity (How can you prove this statement?), we need to include the odorant receptors with E-value WORSE than the threshold. After it converges, how many odorant receptors you get? How many iterations did it take for no new hits to be found?
c. Extract multiple sequences from the BLAST results. Click on the "Select all" button inside the alignment field, and press the "Get selected sequences". In the next page, choose the genes of interest (or all the odorant receptor sequences). Replace "Summary" with "FASTA", and click on "Send to" button.
d. Repeat the above search with PAM30 and Blosum80. How many hits are there with significant alignments? Is the number different from the one with the default matrix? Why?
a. cp /home/lewitter/msh2.fa .
b. cp /home/lewitter/pat1.dat .
c. cp /home/lewitter/pat2.dat .
d. cp /home/lewitter/pat3.dat .
e. scan_for_matches pat1.dat < msh2.fa > out1
f. scan_for_matches pat2.dat < msh2.fa > out2
g. scan_for_matches pat3.dat < msh2.fa > out3
h. more out1
i. more out2
j. more out3
What do your results look like? What happen when you allow mismatches,etc.?