Getting To Know
Your Protein Exercise I Bioinformatics for Biologists 2005 In this exercise, you will be using an unknown sequence to prepare multiple sequence alignment and phylogenetic tree figures. Upon the completion of this exercise, you should have a solid ability to search for homologous sequences, align them, create a phylogenetic tree, and produce manuscript-quality figures of your results. Follow the steps detailed below and use either the applications located on your computer or that are web-based. Please follow the steps in order. If you have difficulty with any of the steps, please ask for assistance.
Step 1 – Find homologous sequences
I. BLAST the following sequence against the non-redundant protein database using the blastp program at:
http://www.ncbi.nlm.nih.gov/BLAST/ MESEMLQSPLLGLGEEDEADLTDWNLPLAFMKKRHCEKIEGSKSLAQSWRMKDRMKTVSVALVLCLNVGVDP PDVVKTTPCARLECWIDPLSMGPQKALETIGANLQKQYENWQPRARYKQSLDPTVDEVKKLCTSLRRNAKEE RVLFHYNGHGVPRPTVNGEVWVFNKNYTQYIPLSIYDLQTWMGSPSIFVYDCSNAGLIVKSFKQFALQREQE LEVAAINPNHPLAQMPLPPSMKNCIQLAACEATELLPMIPDLPADLFTSCLTTPIKIALRWFCMQKCVSLVP GVTLDLIEKIPGRLNDRRTPLGELNWIFTAITDTIAWNVLPRDLFQKLFRQDLLVASLFRNFLLAERIMRSY NCTPVSSPRLPPTYMHAMWQAWDLAVDICLSQLPTIIEEGTAFRHSPFFAEQLTAFQVWLTMGVENRNPPEQ LPIVLQVLLSQVHRLRALDLLGRFLDLGPWAVSLALSVGIFPYVLKLLQSSARELRPLLVFIWAKILAVDSS CQADLVKDNGHKYFLSVLADPYMPAEHRTMTAFILAVIVNSYHTGQEACLQGNLIAICLEQLNDPHPLLRQW VAICLGRIWQNFDSARWCGVRDSAHEKLYSLLSDPIPEVRCAAVFALGTFVGNSAERTDHSTTIDHNVAMML AQLVSDGSPMVRKELVVALSHLVVQYESNFCTVALQFIEEEKNYALPSPATTEGGSLTPVRDSPCTPRLRSV SSYGNIRAVATARSLNKSLQNLSLTEESGGAVAFSPGNLSTSSSASSTLGSPENEEHILSFETIDKMRRASS YSSLNSLIGVSFNSVYTQIWRVLLHLAADPYPEVSDVAMKVLNSIAYKATVNARPQRVLDTSSLTQSAPASP TNKGVHIHQAGGSPPASSTSSSSLTNDVAKQPVSRDLPSGRPGTTGPAGAQYTPHSHQFPRTRKMFDKGPEQ TADDADDAAGHKSFISATVQTGFCDWSARYFAQPVMKIPEEHDLESQIRKEREWRFLRNSRVRRQAQQVIQK GITRLDDQIFLNRNPGVPSVVKFHPFTPCIAVADKDSICFWDWEKGEKLDYFHNGNPRYTRVTAMEYLNGQD CSLLLTATDDGAIRVWKNFADLEKNPEMVTAWQGLSDMLPTTRGAGMVVDWEQETGLLMSSGDVRIVRIWDT DREMKVQDIPTGADSCVTSLSCDSHRSLIVAGLGDGSIRVYDRRMALSECRVMTYREHTAWVVKASLQKRPD GHIVSVSVNGDVRIFDPRMPESVNVLQIVKGLTALDIHPQADLIACGSVNQFTAIYNSSGELINNIKYYDGF MGQRVGAISCLAFHPHWPHLAVGSNDYYISVYSVEKRVR
II. What is this sequence? Does it have any characteristic domains (hint: look in the GenBank report)?
Step 2 – Create a FASTA file containing homologous sequences
I. Compile the accession numbers for the following sequences into one text file – one id number per line: Drosophila, Mouse, Rat, Human.
II. Use Batch ENTREZ at NCBI to retrieve all of the sequences corresponding to the accession numbers in a FASTA formatted file. Save this file to your computer.
http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi
Step 3 – Align Sequences with ClustalX
I. Start the ClustalX application, then FILE->LOAD SEQUENCES.
II. ALIGNMENT->DO COMPLETE ALIGNMENT.
III. FILE->WRITE ALIGNMENT AS POSTSCRIPT.
[ Alternatively, you can create an
alignment with the web tool at: http://www.ebi.ac.uk/clustalw/ ] Step 4 – Create Phylogenetic TreeI. In ClustalX, TREES->Draw N-J Tree
II. In TreeView, OPEN your .ph file • Notice the options to create different shape trees.
III. PRINT->SAVE AS PDF [ Alternatively, you can build trees with your alignment at http://www.ebi.ac.uk/clustalw/ ]
[ NOTE: From here to the end, we assume you are using your desktop applications. ] Step 5 – Manage Postscript Files
I. OPEN alignment postscript file with Acrobat Distiller. • Note that you can view each
page individually.
II. Extract and save each page separately, with new names •For each page, enter the page number to extract and save with a unique name.
Step 6 – Annotate Figure
I. OPEN TIFF or PDF in Adobe Illustrator
II. FILE->EXPORT, save as a Tiff
Step 7 – Create Powerpoint Presentation
I. NEW PRESENTATION
II. Choose
blank slide.
III. INSERT->PICTURE->FROM FILE, select your tiff or PDF file
IV. Rotate the image 90û CCW.
V. Label your slide as you see fit. |