Getting To Know Your Protein

Exercise I

Bioinformatics for Biologists 2005

In this exercise, you will be using an unknown sequence to prepare multiple sequence alignment and phylogenetic tree figures. Upon the completion of this exercise, you should have a solid ability to search for homologous sequences, align them, create a phylogenetic tree, and produce manuscript-quality figures of your results. Follow the steps detailed below and use either the applications located on your computer or that are web-based. Please follow the steps in order. If you have difficulty with any of the steps, please ask for assistance.

Step 1 – Find homologous sequences

I. BLAST the following sequence against the non-redundant protein database using the blastp program at:

http://www.ncbi.nlm.nih.gov/BLAST/

MESEMLQSPLLGLGEEDEADLTDWNLPLAFMKKRHCEKIEGSKSLAQSWRMKDRMKTVSVALVLCLNVGVDP

PDVVKTTPCARLECWIDPLSMGPQKALETIGANLQKQYENWQPRARYKQSLDPTVDEVKKLCTSLRRNAKEE

RVLFHYNGHGVPRPTVNGEVWVFNKNYTQYIPLSIYDLQTWMGSPSIFVYDCSNAGLIVKSFKQFALQREQE

LEVAAINPNHPLAQMPLPPSMKNCIQLAACEATELLPMIPDLPADLFTSCLTTPIKIALRWFCMQKCVSLVP

GVTLDLIEKIPGRLNDRRTPLGELNWIFTAITDTIAWNVLPRDLFQKLFRQDLLVASLFRNFLLAERIMRSY

NCTPVSSPRLPPTYMHAMWQAWDLAVDICLSQLPTIIEEGTAFRHSPFFAEQLTAFQVWLTMGVENRNPPEQ

LPIVLQVLLSQVHRLRALDLLGRFLDLGPWAVSLALSVGIFPYVLKLLQSSARELRPLLVFIWAKILAVDSS

CQADLVKDNGHKYFLSVLADPYMPAEHRTMTAFILAVIVNSYHTGQEACLQGNLIAICLEQLNDPHPLLRQW

VAICLGRIWQNFDSARWCGVRDSAHEKLYSLLSDPIPEVRCAAVFALGTFVGNSAERTDHSTTIDHNVAMML

AQLVSDGSPMVRKELVVALSHLVVQYESNFCTVALQFIEEEKNYALPSPATTEGGSLTPVRDSPCTPRLRSV

SSYGNIRAVATARSLNKSLQNLSLTEESGGAVAFSPGNLSTSSSASSTLGSPENEEHILSFETIDKMRRASS

YSSLNSLIGVSFNSVYTQIWRVLLHLAADPYPEVSDVAMKVLNSIAYKATVNARPQRVLDTSSLTQSAPASP

TNKGVHIHQAGGSPPASSTSSSSLTNDVAKQPVSRDLPSGRPGTTGPAGAQYTPHSHQFPRTRKMFDKGPEQ

TADDADDAAGHKSFISATVQTGFCDWSARYFAQPVMKIPEEHDLESQIRKEREWRFLRNSRVRRQAQQVIQK

GITRLDDQIFLNRNPGVPSVVKFHPFTPCIAVADKDSICFWDWEKGEKLDYFHNGNPRYTRVTAMEYLNGQD

CSLLLTATDDGAIRVWKNFADLEKNPEMVTAWQGLSDMLPTTRGAGMVVDWEQETGLLMSSGDVRIVRIWDT

DREMKVQDIPTGADSCVTSLSCDSHRSLIVAGLGDGSIRVYDRRMALSECRVMTYREHTAWVVKASLQKRPD

GHIVSVSVNGDVRIFDPRMPESVNVLQIVKGLTALDIHPQADLIACGSVNQFTAIYNSSGELINNIKYYDGF

MGQRVGAISCLAFHPHWPHLAVGSNDYYISVYSVEKRVR

II. What is this sequence? Does it have any characteristic domains (hint: look in the GenBank report)?

Step 2 – Create a FASTA file containing homologous sequences

I. Compile the accession numbers for the following sequences into one text file – one id number per line: Drosophila, Mouse, Rat, Human.

II. Use Batch ENTREZ at NCBI to retrieve all of the sequences corresponding to the accession numbers in a FASTA formatted file. Save this file to your computer.

http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi

Step 3 – Align Sequences with ClustalX

I. Start the ClustalX application, then FILE->LOAD SEQUENCES.

II. ALIGNMENT->DO COMPLETE ALIGNMENT.

III. FILE->WRITE ALIGNMENT AS POSTSCRIPT.

[ Alternatively, you can create an alignment with the web tool at: http://www.ebi.ac.uk/clustalw/ ]

Step 4 – Create Phylogenetic Tree

I. In ClustalX, TREES->Draw N-J Tree

II. In TreeView, OPEN your .ph file

• Notice the options to create different shape trees.

III. PRINT->SAVE AS PDF

[ Alternatively, you can build trees with your alignment at http://www.ebi.ac.uk/clustalw/ ]

[ NOTE: From here to the end, we assume you are using your desktop applications. ]

Step 5 – Manage Postscript Files

I. OPEN alignment postscript file with Acrobat Distiller.

• Note that you can view each page individually.

II. Extract and save each page separately, with new names

•For each page, enter the page number to extract and save with a unique name.

Step 6 – Annotate Figure

I. OPEN TIFF or PDF in Adobe Illustrator

II. FILE->EXPORT, save as a Tiff

Step 7 – Create Powerpoint Presentation

I. NEW PRESENTATION

II. Choose blank slide.

III. INSERT->PICTURE->FROM FILE, select your tiff or PDF file

IV. Rotate the image 90˚ CCW.

V. Label your slide as you see fit.