HOMEWORK 7
The purpose of the homework is to practice how to draw phylogenetic trees from
multiple sequence alignments with clustalx and paup, and to find blocks within the alignment using jalview.
- Retrieve target sequences from blast output, and do a multiple sequence alignment with them.
- Blast the e2f1 protein sequence against the non-redundant database with blastp program in NCBI. From the blast output, retrieve the top 10 database sequences in fasta format, and save them in one file. You can also download the script blast_get_target_seq.pl from last week's homework, and run it on tak.
- On tak, after opening clustalx, load the file with the top 10 database sequences from File->Load Sequence. Look at the alignment on the window, you will find one sequence is very different from the others. You should discard this sequence. To delete this protein, you can first highlight it, and click on Cut Sequence from Edit panel. You can choose the alignment format as MSF by Alignment->Output Format Option. Then do the complete alignment with Alignment->Do Complete Alignment. The newly produced alignment file ends with .msf. For example, if the name of the top 10 database sequence file is e2f1_out, the alignment file will be e2f1_out.msf. To exit from clustalx with File->Quit.
- Draw the corresponding tree with paupsearch and display it with paupdisplay.
- Draw the tree with paup. On tak, type gcg. After the prompt, type paupsearch. Type YourAlignedSequenceName{*} after the prompt as "What aligned sequences to analyze". "*" means all the aligned sequences. For example, if the alignment file is e2f1_out.msf, it would be "e2f1_out.msf{*}". There are several different types of tree search options. For this question, we can choose to reconstruct neighbor-joining tree. Then, name your tree.
- Display the tree with paupdisplay. Type paupdisplay on tak and follow the prompts. You can choose display Description and plot of trees. The optimality criterion for the tree description as parsimony. For plot output options, choose Plot on LASERWRITER attached to HomeDir:graphics.ps.
- Find the blocks within the multiple sequence alignment with jalview on tak.
- To open the jalview window, type java jalview.AlignFrame YourAlignedSequenceName File MSF at the tak command line. YourAlignedSequenceName is the name of your multiple sequence alignment file you got from step 1. All letters are case sensitive. Because this file is in MSF format, the name after File has to be MSF.
- In order to select a block, you need to delete all the gaps in the alignment. To delete select portions of the sequences, first highlight the position you like to delete to, then click on Edit. You can remove all the sequences to the left of that position with Remove sequence<-left of selected column. You can do the similar thing to remove all the columns to the right of the highlighted position. But first, find the largest portion of the alignment without a gap, use it as a block. Then delete the sequences on each end by the method just described. Save the block as a new file in MSF format. Open the YourBlockFile.msf in ClustalX and the save it as a postscript file by File->Write Alignment to Postscript File. Ignore the error messages on the screen, and click CLOSE. You will find a new .ps file in your directory. Then, you can transfer this file to you desktop with ftp. You can display the result with adobe photoshop or adobe illustrator.