Getting To Know Your Protein

Exercise I Answers

 

Bioinformatics for Biologists 2005

 

 

In this exercise, you will be using an unknown sequence to prepare multiple sequence alignment and phylogenetic tree figures. Upon the completion of this exercise, you should have a solid ability to search for homologous sequences, align them, create a phylogenetic tree, and produce manuscript-quality figures of your results. Follow the steps detailed below and use either the applications located on your computer or that are web-based. Please follow the steps in order. If you have difficulty with any of the steps, please ask for assistance.

 

Step 1 – Find homologous sequences

 

I. BLAST the following sequence against the non-redundant protein database using the blastp program at:

 

http://www.ncbi.nlm.nih.gov/BLAST/

Selected BLAST results:

 

gi|21979456|gb|AAM09075.1| raptor [Homo sapiens] >gi|220949... 2614 0.0

gi|30061325|ref|NP_083174.1| raptor [Mus musculus] >gi|4657... 2542 0.0

gi|54035208|gb|AAH84088.1| LOC495002 protein [Xenopus laevis] 2412 0.0

gi|7242961|dbj|BAA92541.1| KIAA1303 protein [Homo sapiens] 2193 0.0

gi|34875607|ref|XP_213539.2| similar to p150 target of rapa... 1557 0.0

gi|50745382|ref|XP_426232.1| PREDICTED: similar to p150 tar... 1332 0.0

gi|12855312|dbj|BAB30288.1| unnamed protein product [Mus mu... 1237 0.0

gi|24640048|ref|NP_572294.1| CG4320-PA [Drosophila melanoga... 1023 0.0

gi|31711792|gb|AAP68252.1| At3g08850 [Arabidopsis thaliana]... 932 0.0

gi|47214942|emb|CAG10764.1| unnamed protein product [Tetrao... 924 0.0

gi|55236253|gb|EAL39258.1| ENSANGP00000026347 [Anopheles ga... 923 0.0

gi|6403497|gb|AAF07837.1| unknown protein [Arabidopsis thal... 920 0.0

 

II. What is this sequence? RAPTOR

Does it have any characteristic domains? WD40 Repeats

 

Step 2 – Create a FASTA file containing homologous sequences

 

I. Compile the accession numbers for the following sequences into one text file – one id number per line: Drosophila, Mouse, Rat, Human.

 

AAM09075.1

NP_083174.1

XP_213539.2

NP_572294.1

 

 

II. Use Batch ENTREZ at NCBI to retrieve all of the sequences corresponding to the accession numbers in a FASTA formatted file. Save this file to your computer.

 

sequences.fasta

 

Step 3 – Align Sequences with ClustalX

 

I. Start the ClustalX application, then FILE->LOAD SEQUENCES.

 

II. ALIGNMENT->DO COMPLETE ALIGNMENT.

sequences.aln

 

III.  FILE->WRITE ALIGNMENT AS POSTSCRIPT.

 

[ Alternatively, you can create an alignment with the web tool at: http://www.ebi.ac.uk/clustalw/  ]

Step 4 – Create Phylogenetic Tree

 

I.  In ClustalX, TREES->Draw N-J Tree

sequences.ph

 

II. In TreeView, OPEN your .ph file

• Notice the options to create different shape trees.

 

III. PRINT->SAVE AS PDF

sequences.pdf

[  Alternatively, you can build trees with your alignment at http://www.ebi.ac.uk/clustalw/  ]

 

[ NOTE: From here to the end, we assume you are using your desktop applications. ]

Step 5 – Manage Postscript Files

 

I. OPEN alignment postscript file with Acrobat Distiller.

• Note that you can view each page individually.

 

II. Extract and save each page separately, with new names

•For each page, enter the page number to extract and save with a unique name.

 

Step 6 – Annotate Figure

 

I. OPEN PDF in Adobe Illustrator

 

II. FILE->EXPORT, save as a Tiff

sequences.tif

 

Step 7 – Create Powerpoint Presentation

 

I. NEW PRESENTATION

 

II.  Choose blank slide.

 

III. INSERT->PICTURE->FROM FILE, select your tiff or PDF file

 

IV. Rotate the image 90 degrees CCW.

 

V. Label your slide as you see fit.

sequences.ppt