High School Program 2010 - Exploring molecular mechanisms of childhood diseases




Welcome to a Bioinformatics workshop! Today you'll be exploring molecular mechanisms of childhood diseases, in particular sickle cell anemia and maple syrup urine disease.

For this workshop you will be using different bioinformatics web resources to learn how these diseases are inherited and how they change normal physiology by mutating protein structures.

These are the areas we'll be investigating:

  • What gene(s) cause these diseases? Where are these genes found in the human genome? What is the function of the protein(s) they encode?
  • What specific mutation(s) cause these diseases?
  • What is the normal structure of the proteins involved in these diseases? How are they changed in someone who has the disease?

    These two diseases are among the thirty included in routine newborn screening in Massachusetts.

    Green indicates words you type into a form.

    Red indicates questions to answer.

  • [Go to Sickle cell anemia] [Go to Maple syrup urine disease]

    Sickle cell anemia

    Section 1: Introducing the disease.

      MedlinePlus is a U.S. government health information resource.

      1. Search MedlinePlus for sickle cell anemia.
      2. The first result is a page of links to lots of different information.
      3. The second result may be more helpful, as it lists causes and symptoms.
      4. Why does the name of the disease refer to a "sickle cell"? ANSWER
      5. Usually do you get sickle cell anemia from just one parent? ANSWER
      6. What is anemia? You may find a link to more information on the sickle cell anemia page, or you can go to the MedlinePlus Encyclopedia anemia page. ANSWER
      7. What abnormal protein causes sickle cell anemia? ANSWER
    Section 2: Finding and analyzing hemoglobin B in the human genome.

      A. Hemoglobin is made of four protein chains, together with a non-protein heme group that binds iron. One of the hemoglobin proteins, hemoglobin B ("HBB"), is involved in sickle cell anemia. Hemoglobin is needed to carry oxygen in the blood.

      B. To find where the gene encoding hemoglobin B (HBB) is found in the human, you can go to the UCSC Genome Browser.

      1. In the "position or search term" box, enter HBB. Under "UCSC Genes", click on the HBB link (the first link).
      2. Look at the text or figure above the main figure. On what chromosome is hemoglobin B? ANSWER
      3. Look at the dark blue part of the diagram under "UCSC genes". Note that the blocks are exons, connected by horizontal lines representing introns. The 5' and 3' untranslated regions (that do NOT code for protein) are displayed as thinner parts of the blocks at the beginning and end of the gene.
      4. How many exons does hemoglobin B have? ANSWER
      5. Still looking at the same part of the figure, note the "hash marks" along the thin introns. They indicate the direction of the gene on this chromosome.
      6. Is the direction of hemoglobin B from left to right to right to left? ANSWER
      7. To flip the direction so we can see the gene from left to right, click on the "reverse" button under the figure.
      8. Go to the beginning of part of the hemoglobin B gene that encodes the protein. There are navigation buttons at the top, but it's easiest to enter chr11:5,248,207-5,248,271 in the position/search box -- and click on "jump" near the top of the page.
      9. To see the actual genome sequence (under the numbers, near the top of the figure), you may need to click on the "base" button near the top of the page.
    Section 3: Conservation of hemoglobin B protein sequence across different species

      We'll continue where we left off in the last section, looking at the beginning of the protein-coding part of hemoglobin B in the human genome

      1. To see the actual genome sequence [if you haven't done so already], click on the "base" button at the top of the page.
      2. Note the DNA sequence at the top of the figure. Look above the green box in the gene figure with the "M".
      3. What are the first six DNA bases (letters) starting with this position? ANSWER
      4. The green "M" and the blue letters after it represent one-letter amino acid abbreviations, each encoded by 3 DNA letters (codons) above.
      5. What eight amino acids does the hemoglobin B protein chain begin with? ANSWER
      6. Scroll down the page to a blue "Comparative Genomics" bar and find "Conservation".
      7. Change the pull-down menu below "Conservation" to "full" (if it's not set to that already).
      8. Click on "Conservation" to go to the Track Settings page..
      9. Next to "Species selection", near the top of the yellow area, click the "+" and then click on the Submit button.
      10. Scroll down the genome browser to the protein sequences under the "Multiz Alignments of 46 Vertebrates".
      11. Which of the first eight amino acids are the same in most displayed animals (and even fish: tetraodon, fugu, medaka, zebrafish)? ANSWER
      12. We can see that hemoglobin B is present in all of these animals and its protein sequence is very similar in many of them.
    Section 4: Gene/protein mutations in sickle cell anemia

      A. An excellent source of information about genetic diseases is Online Mendelian Inheritance in Man (OMIM)

      1. Go to OMIM, search for sickle cell anemia, and click on the first hit ("#603903").
      2. Among other details, OMIM describes medical observations about specific patients with sickle cell anemia.
      3. You could right-click (to open a new window) on the link after the statement, "The most common cause of sickle cell anemia is HbS". OR ... to put the new page on the other side of this page, go directly to the OMIM section on HbS
      4. This new OMIM page describes all sorts of mutations in hemoglobin B, including this one, which commonly causes sickle cell anemia.
      5. As shown next to the OMIM identifier of this mutation is the expression "GLU6VAL".
      6. Given the genetic code, what is the meaning of GLU and VAL? ANSWER
      7. Then what is the meaning of "HBB, GLU6VAL"? ANSWER
      8. Now we need to figure out what DNA change is responsible for this protein change.
      9. Go back to the genome browser showing the beginning of the HBB gene.
      10. Note that the sixth amino acid position after M (methionine) is E (GLU; glutamic acid), which is the site of the mutation we've been examining.
      11. What is the codon (3 DNA letters) that encodes E (GLU) that is mutated in sickle cell anemia? ANSWER
      12. You could go back to the OMIM page for hemoglobin B and right-click (to open a new window) on the blue dbSNP box. OR ... to put the dbSNP page on the other side of this page, go directly to dbSNP for rs334

      B. dbSNP is a database of single nucleotide polymorphisms (SNPs, pronounced "snips"), which are single positions in the genome that can differ from person to person.

      1. We are all different in part because of each one of us has different DNA nucleotides at these SNP positions. (We're also different because we may have additional little bits of DNA that others don't have, or vice versa.)
      2. On our dbSNP page, scroll down to "GeneView" and look at the table with "Function" in the first column.
      3. Looking at the last row of this table, note the allele change "GAG => GTG".
      4. What is the corresponding residue (amino acid) change? ANSWER
      5. So to summarize, if one specific DNA letter is changed on chromosome 11 from both your mother and your father, you'll make hemoglobin B protein with one wrong amino acid, and you'll have sickle cell anemia.
    Section 5: Structure of the hemoglobin B protein

      A. To get at a 3-dimensional view of hemoglobin B, we can use the Protein Data Bank, a database of protein structures that have been determined experimentally by methods such as X-ray crystallography.

      1. Go to the Protein Data Bank and search for 1A3N, one of the many structures of hemoglobin and its variants.
      2. Below the figure at right, click on "Protein Workshop".
      3. Click on the "Launch RCSB - Protein Workshop" to open the structure visualization program. If your browser has a pop-up blocker, you may have to tell it to allow the pop-up.
      4. Say OK when the browser asks about opening something like "RCSB-ProteinWorkshop.jnlp"
      5. Click "run" to Run Protein Workshop software.

      B. This is structure of the hemoglobin complex. It is presented as a backbone cartoon, showing the secondary structure (helices and sheets) of the protein, rather than all the atoms.

      1. Look at the right white panel. How many protein chains are listed? ANSWER
      2. Click on the structure and rotate around by moving the mouse up, down, right and left. You can also zoom in by holding down the Shift key and dragging the mouse.
      3. Note that the "ball-and-stick" structures are heme, the non-protein part of the complex with an iron atom (the large gray sphere) in the middle.
      4. How many heme molecules does the protein complex have? ANSWER
      5. For steps 6 to 8, use the Tools tab...
      6. Under "Choose items from the tree or 3rd viewer" menu, click on the "+" on Chain B to show all residues on that chain, then click on residue 6 GLU to highlight it.
      7. Click on the "Styles" button and change "Radius of Atoms" to CPK
      8. Click on Chain B residue 6 GLU again to change the display of that residue.
      9. You have now highlighted the residue that is mutated on sickle cell.
      10. Repeat steps 3 to 5 for the other HBB chain (chain D).
      11. Are the highlighted amino acids (mutated in the sickle cell protein) in the middle or on the outside of the protein complex? ANSWER
      12. In people with sickle cell anemia, these mutated valine molecules cause hemoglobin chains to stick to each other and form abnormally large structures like those shown here (with the mutated amino acids shown in green).
      13. What is the shape of these abnormal structures? ANSWER
      14. These abnormal monster structures deform the normal red blood cell into the sickle shape. Optional: more information about this, from the Protein Data Bank, is here
    Section 6: Summary

    Maple syrup urine disease

    Section 1: Introducing MSUD.

      A. Using MedlinePlus to get introductory information

      1. Search MedlinePlus for maple syrup urine disease and click on the third link ("Maple syrup urine disease").
      2. What is the fundamental problem in people who have this disease? ANSWER
      3. Does the disease make your urine taste like maple syrup? ANSWER

      B. The genetics of MSUD

      1. Go to the Online Mendelian Inheritance in Man (OMIM) and search for maple syrup urine disease.
      2. Click on the first hit ("#248600").
      3. Is this disease caused by one gene, like sickle cell anemia? ANSWER
      4. Maple syrup urine disease is a classic "inborn error of metabolism", where a certain metabolic pathway has problems because an enzyme doesn't work correctly.
      5. Check out NCBI's GeneReviews page for MSUD.
      6. Under "Genetic counseling", inheritance is described as "autosomal recessive". What does this mean? ANSWER
    Section 2: Normal metabolism of enzymes involved in MSUD

      A. A good metabolic pathway resource is the Kyoto Encyclopedia of Genes and Genomes (KEGG)

      1. Go to KEGG, search for maple syrup urine disease, and click on Maple syrup urine disease ("H00172").
      2. On the KEGG page for MSUD (DISEASE: H00172), find "Pathway" and click on "hsa00280".
      3. You may want to click on the Help button (top right of page) to get a key of what means what.
      4. Note that some of the boxes are highlighted in red -- these are some of the enzymes that can cause MSUD:

        Gene symbolProtein nameEnzyme ID
        BCKDHAbranched chain keto acid dehydrogenase E1, alpha polypeptide1.2.4.4
        BCKDHBbranched chain keto acid dehydrogenase E1, beta polypeptide1.2.4.4
        DLDdihydrolipoamide dehydrogenase1.8.1.4
        DBTdihydrolipoamide branched chain transacylase E22.3.1.168

      5. Why might BCKDHA and BCKDHB have the same enzyme ID? ANSWER
      6. According to this pathway diagram, can valine, leucine, or isoleucine degradation bypass any of these enzymes? ANSWER

      B. OPTIONAL: another pathway resource is the Small Molecule Pathway Database (SMPDB)

      1. Go to SMPDB, search for maple syrup urine disease, and click on the first PATHWAY button ("SMP0019").
      2. Zoom in to better read the molecule names.
      3. This figure shows how the first step(s) of degradation works fine but that byproducts accumulate when a subsequent step is blocked.

      C. OPTIONAL: If you're curious about other inborn errors of metabolism, a big metabolic chart shows many of them together (listed on the sides of the figure). MSUD is under "Amino Acid Metabolism" in the middle of the right side.

    Section 3: Summary

    EXTRA CREDIT Part 1: Finding and analyzing BCKDHA, BCKDHB, DBT, or DLD in the human genome.

      To find where the genes encoding these proteins are found in the human, you can go to the UCSC Genome Browser.

      1. In the "position or search term" box, enter BCKDHA, BCKDHB, DBT, or DLD. Under "UCSC Genes", click on the first link for that gene.
      2. As before, look at the dark blue part of the diagram under "UCSC genes". The blocks are exons, connected by horizontal lines representing introns. The 5' and 3' untranslated regions (that do NOT code for protein) are displayed as thinner parts of the blocks at the beginning and end of the gene.
      3. You may notice that your gene has multiple transcripts (gene models in different rows in the figure )
      4. How many transcripts does your gene have (according to the UCSC Genes track )? ANSWER
      5. What range of exons does your gene have? ANSWER
    EXTRA CREDIT Part 2: Section 4: Gene/protein mutations in MSUD

      We can check out the Online Mendelian Inheritance in Man (OMIM) to see what mutations are involved in MSUD.

      1. Go to OMIM, search for BCKDHA, BCKDHB, DBT, or DLD and click on the first hit.
      2. On the OMIM page for the gene you selected, do a browser search (Edit >> Find) for or scroll down to .0001 MAPLE SYRUP URINE DISEASE
      3. ".0001" lists the first mutation in this protein. How many different mutations in your gene can cause maple syrup urine disease? ANSWER
      4. Note that for each of these mutations, OMIM uses the notation like "TYR393ASN", to indicate the amino acid position (393 in this example), the normal amino acid (TYR), and the mutated amino acid (ASN). You can go back to the genetic code page to get the 1-letter amino acid abbreviation, the full name, the structure, and the 3-letter codons that encode this amino acid.
      5. Why do think that there are so many mutations that can cause MSUD, whereas only one specific amino acid mutation causes sickle cell disease? ANSWER
    EXTRA CREDIT Part 3: Structure of mutant MSUD enzymes

      A. We can again use the Protein Data Bank, a database of protein structures that have been determined experimentally by methods such as X-ray crystallography.

      1. Go to the Protein Data Bank and search for 1X7Y (branched-chain alpha-ketoacid dehydrogenase, made up of chains of BCKDHA and BCKDHB).
      2. Below the figure at right, click on "Protein Workshop".
      3. Click on the "Launch RCSB - Protein Workshop" to open the structure visualization program.
      4. Say OK when the browser asks about opening something like "RCSB-ProteinWorkshop.jnlp"
      5. Click "run" to Run Protein Workshop software.
      6. This is a structure of an enzyme complex of BCKDHA (Chain A, but missing the first 45 amino acids) and BCKDHB (Chain B, but missing the first 50 amino acids).

      B. If you have time, you can also check out Protein Data Bank structures of DLD (1ZMD) or DBT (2II5).


    Whitehead Institute
    Bioinformatics and Research Computing