Homework 8

HOMEWORK 8

The purpose of this assignment is to familiarize you with techniques used to identify patterns and profiles, as well as how to use patterns and profiles to search databases.

Build a pattern and search a sequence database.
Perform a multiple sequence alignment on the file sequences.fasta using clustalx (or your favorite msa application) and save it as sequence.aln . Build a pattern of the first 30 positions within the alignment using a sequence driven method, as shown on slide 9 from lecture 8. Simply list commonly occuring amino acids (the amino acids appear equal or more than 3 times in a column) for each column, then convert this list to a patscan syntax (hints: slide 10 - lecture 8 and http://web.wi.mit.edu/bio/pub/patscan.html). Here is an example pattern.gif. Once you have created the pattern syntax, put it into a file in your directory on fladda.wi.mit.edu, named pattern_file. Then issue the following command:
```
scan_for_matches -p pattern_file < /usr/people/latek/smalldb.fasta > pattern.out
```
Can you categorize the results of your pattern search? What biological properties do they have in common? (You can find out the descriptions of the hits on NCBI entrez.)
Build a profile and use it to search a sequence database.
Build a profile of the alignment from problem 1. Here is the command to use on fladda:
```
hmmbuild sequences.prf sequences.aln
```
This will build a profile (sequence.prf) for the sequences aligned in sequence.aln. Remember to calibrate your profile with the command:
```
hmmcalibrate sequences.prf
```
Finally, search a small database for sequences that match your profile, and only check the ones which e_values are below 1:
```
hmmsearch -E 1 sequences.prf /usr/people/latek/smalldb.fasta
```
How are the results of your profile search related? How do they compare to your patscan results form problem #1?