Getting To Know Your Protein

Exercise II -Answers

Bioinformatics for Biologists 2005

In this exercise, you will be identifying protein domains within an unknown sequence. You will also be using a protein domain pattern and profile to search a database for related sequences. Upon the completion of this exercise, you should be comfortable with browsing and searching domain databases. Follow the steps detailed below and use either the applications located on your computer or that are web-based. Please follow the steps in order. If you have difficulty with any of the steps, please ask for assistance.

(For these exercises, use application default settings)

Step 1 – Identify protein domains

II. What are these domains? WD40 domain, HEAT repeats.

Can you identify other proteins that contain this domain? http://pfam.wustl.edu/cgi-bin/getdesc?name=WD40

What is interesting about the domain architecture for these domains? Multiple WD40 and HEAT repeats contained within sequence.

Step 2 – Create a pattern (consensus) for the domain in Step1

I. This time, search ProSite for domains in the sequence from Step1.Write down the ProSite identifier number PSxxxxx for this domain.PS50294

II. Use the search box on the top of the ProSite page to find information regarding the domain identified in the previous question. Locate the consensus representing this domain and copy it to a text file

LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)-[DN]- x(2)-[LIVMWSTAC]-x-[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN]

Step 3 – Search a database with your sequence pattern

I.Convert the ScanProsite pattern to PatScan syntax. Save as a text file. (For simplicity, you don't have to convert the whole pattern). any(LIVMSTAC) any(LIVMFYWSTAGC) any(LIMSTAG) any(LIVMSTAGC) 2...2 any(DN) 2...2 any(LIVMWSTAC) 1...1 any(LIVMFSTAG) W any(DEN) any(LIVMFSTAGCN)

Step 4 – Use a profile to search a database

II. What types of proteins do you find? Re-run with sequences having E-values less than 0.0001 for 4 iterations.

• Now what kinds of sequences do you retrieve? Metabolism and germination sequences.