HOMEWORK 1

HOMEWORK 1
  1. Perform a self-comparison of human haptoglobin sequence with dottup, an EMBOSS interface at Institut Pasteur. Open the above link, choose 'png' next to 'graph [devise to be display on]' and next to xygraph (-xygraph). You can use Adobe Illustrator or Photoshop to open the png file if your computer can't open it automaticaly.
    >Human haptoglobin alpha(2FS)-beta protein
    MSALGAVIALLLWGQLFAVDSGNDVTDIADDGCPKPPEIAHGYVEHSVRYQ
    CKNYYKLRTEGDGVYTLNDKKQWINKAVGDKLPECEADDGCPKPPEIAHGY
    VEHSVRYQCKNYYKLRTEGDGVYTLNNEKQWINKAVGDKLPECEAVCGKPK
    NPANPVQRILGGHLDAKGSFPWQAKMVSHHNLTTGATLINEQWLLTTAKNL
    FLNHSENATAKDIAPTLTLYVGKKQLVEIEKVVLHPNYSQVDIGLIKLKQK
    VSVNERVMPICLPSKDYAEVGRVGYVSGWGRNANFKFTDHLKYVMLPVADQ
    DQCIRHYEGSTVPEKKTPKSPVGVQPILNEHTFCAGMSKYQEDTCYGDAGS
    AFAVHDLEEDTWYATGILSFDKSCAVAEYGVYVKVTSIQDWVQKTIAEN
    
  2. Compare the alignment scores obtained with small and large gap penalties in the following example.
    >Drosophila melanogaster Odorant receptor 85e (Or85e)   
    MASLQFHGNVDADIRYDISLDPARESNLFRLLMGLQLANGTKPSPRLPKW
    WPKRLEMIGKVLPKAYCSMVIFTSLHLGVLFTKTTLDVLPTGELQAITDA
    LTMTIIYFFTGYGTIYWCLRSRRLLAYMEHMNREYRHHSLAGVTFVSSHA
    AFRMSRNFTVVWIMSCLLGVISWGVSPLMLGIRMLPLQCWYPFDALGPGT
    YTAVYATQLFGQIMVGMTFGFGGSLFVTLSLLLLGQFDVLYCSLKNLDAH
    TKLLGGESVNGLSSLQEELLLGDSKRELNQYVLLQEHPTDLLRLSAGRKC
    PDQGNAFHNALVECIRLHRFILHCSQELENLFSPYCLVKSLQITFQLCLL
    VFVGVSGTREVLRIVNQLQYLGLTIFELLMFTYCGELLSRHSIRSGDAFW
    RGAWWKHAHFIRQDILIFLVNSRRAVHVTAGKFYVMDVNRLRSVITQAFS
    FLTLLQKLAAKKTESEL
    
    >Drosophila melanogaster Odorant receptor 23a (Or23a) MKLSETLKIDYFRVQLNAWRICGALDLSEGRYWSWSMLLCILVYLPTPMLL RGVYSFEDPVENNFSLSLTVTSLSNLMKFCMYVAQLTKMVEVQSLIGQLDA RVSGESQSERHRNMTEHLLRMSKLFQITYAVVFIIAAVPFVFETELSLPMP MWFPFDWKNSMVAYIGALVFQEIGYVFQIMQCFAADSFPPLVLYLISEQCQ LLILRISEIGYGYKTLEENEQDLVNCIRDQNALYRLLDVTKSLVSYPMMVQ FMVIGINIAITLFVLIFYVETLYDRIYYLCFLLGITVQTYPLCYYGTMVQE SFAELHYAVFCSNWVDQSASYRGHMLILAERTKRMQLLLAGNLVPIHLSTY VACWKGAYSFFTLMADRDGLGS
    For this question, use the program LALIGN based on William Pearson's lalign program.
    A. Use LALIGN to align the above two sequences (copy and paste above two sequences without the first protein description line). Note the length of the alignment, the % identity, and the score of the alignment.
    B. Repeat the alignment with gap penalties of -5 and -1 and note the features of the alignment.
    C. Describe what happened when the gap penalties were reduced. Which of these alignments look like a local alignment and which like a global alignment?

  3. Find the optimal global alignment and the resulting score between sequence GAGC and CCG by completing the entire scoring matrix with the scoring system (+1 for a match, -1 for a mismatch, and -2 for a gap). Page 69-72 of Bioinformatics: Sequence and Genome Analysis by David W. Mount has detail description on global alignment. There are copies of the book in the library. After you trace back the matrix and get the alignment, add the individual alignment scores taken directly from the above scoring system, and compare the sum with the score from your matrix. Are they the same?

  4. The BLASTP algorithm you may familiar with performs a local alignment between a query sequence and a matching database sequence. Align the sequences MDPW and MEDPW using the Smith-Waterman algorithm described on page 72-73 of Bioinformatics: Sequence and Genome Analysis by David W. Mount. Using the dynamic programming algorithm with the blosum62 scoring matrix downloaded from NCBI and a gap penalty as -5.