BOA Home Page

Welcome to Biology on the Web! Here we'll explore the world of Biocomputing to get a feel for the kind of biological information available on the web.
In today's lab, you will explore information about one type of human colon cancer - hereditary non-polyposis colon cancer (HNPCC) and the mismatch repair gene. This is one of the "spellchecker" genes for DNA replication.You will learn its relevance to yeast and bacteria, and see how tools available on the web can help keep researchers and the public informed.

To start you will search the Online Mendelian Inheritance in Man (OMIM) database. This database is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere. You will then follow some links to explore other relevant information available to you. Finally, you will see how similar the gene responsible for HNPCC is in a variety of organisms.

What the colors mean:

blue/purple and underlined is a link
red indicates words you type into a form
green indicates output from a web page
magenta indicates that a definition follows

First let's search OMIM. Here you will enter the words human colon cancer in the keywords text box. Then click the Submit Search Button.
You should see a page of results with many links. Click on the first link

*120435 COLON CANCER....

If you have time later, you can take a look at some of the other links.
This entry contains a list of links to other information, some of which we will look at in a few minutes. First, read the first paragraph under the subheading "Text" to get an introduction to the genetics of this disease. You can also scroll through the document to read additional information about HNPCC. This paragraph describes some of the issues in trying to determine if a disease is inherited or if it is caused by the environment. It also mentions that the gene (the fundamental physical and functional unit of heredity; a gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein)) responsible for this form of colon cancer is found on Chromosome 2 (Humans have 23 pairs of chromosomes). A little later, we'll get more information on the nature of the gene itself.
Scroll down the page and read additional information if desired. When you are through, click here to get back to the top of the OMIM article. Now click on the section heading "Creation Date" in the Table of Contents. When was this entry last updated?
Click here to get back to the top of the OMIM article. Click on the LocusLink button. (LocusLink is a database of genes and has links to many resources.) Click on the P under Links. This will lead you to the PubMed database of literature curated by the National Center for Biotechnology Information at the National Institutes of Health. Read the article titled Mutations predisposing to hereditary nonpolyposis colorectal cancer. If you'd like to read more abstracts, click on the Related Articles Button. When you have finished reading articles, continue to the next item.
Go back to the LocusLink page. Now click on the U under Links. This will take you to a database called Unigene that is a catalog of unique genes in humans. It is estimated that there may be as many as 150,000 genes in humans. This database currently lists more than 80,000. This page also lists other organisms having a similar gene. Notice that in mouse (M. musculus) there is 92% similiarity in the gene, whereas in the bacteria (E. coli), the similarity is only 33%. We can see what this looks like by looking an alignment of the protein sequences from these different organisms.
First let's look at the alignment for human, mouse and rat copies of the gene. The amino acids (identified by their single letter code) that are colored are identical in human and at least one other species. Notice how similar the sequences from these related organisms are. Now take a look at the alignment for all species known to have this gene. Notice that now the only amino acids colored are those that are common to at least 5 organisms. Scroll through the alignment to find an area of the gene that is very similar in all organisms. This so called "conserved" region may be important in the three-dimensional structure of the protein.
Now let's go back to the OMIM entry for HNPCC and follow the link to HGMD, the Human Gene Mutation Database (a mutation is any heritable change in a DNA sequence). Here you will find a catalog of all of the changes to the normal sequence of the HNPCC gene. If you scroll down the page, you will see that there are 100 listed mutations in this gene, most of which are responsible for HNPCC. Notice that the mutations can be insertions, deletions, missense or nonsense. (Insertions add extra DNA to a gene; deletions remove DNA; missense means that there is an alteration in the DNA that causes the wrong amino acids to be included in the protein; and, nonsense refers to an error that significantly changes the length of the protein and hence its ability to perform properly.) Take a look at a map of the location of the many mutations in this gene. Notice how these mutations are spread out over nearly the full length of the protein (all 935 amino acids) but mutations have not been observed at all positions. What do you think this means?
There's one more step before completing this lab. You will use a tool that scientists use on a daily basis - BLAST. This is a database search tool that takes as input a DNA or protein sequence and searches against a database of known sequences. Below is the human HNPCC gene. You will search against the E. coli database to see what sequences in this bacteria are similar to the human sequence. The human sequence is listed below:
```
MAVQPKETLQLESAAEVGFVRFFQGMPEKPTTTVR
LFDRGDFYTAHGEDALLAAREVFKTQGVIKYMGPA
GAKNLQSVVLSKMNFESFVKDLLLVRQYRVEVYKN
RAGNKASKENDWYLAYKASPGNLSQFEDILFGNND
MSASIGVVGVKMSAVDGQRQVGVGYVDSIQRKLGL
CEFPDNDQFSNLEALLIQIGPKECVLPGGETAGDM
GKLRQIIQRGGILITERKKADFSTKDIYQDLNRLL
KGKKGEQMNSAVLPEMENQVAVSSLSAVIKFLELL
SDDSNFGQFELTTFDFSQYMKLDIAAVRALNLFQG
SVEDTTGSQSLAALLNKCKTPQGQRLVNQWIKQPL
MDKNRIEERLNLVEAFVEDAELRQTLQEDLLRRFP
DLNRLAKKFQRQAANLQDCYRLYQGINQLPNVIQA
LEKHEGKHQKLLLAVFVTPLTDLRSDFSKFQEMIE
TTLDMDQVENHEFLVKPSFDPNLSELREIMNDLEK
KMQSTLISAARDLGLDPGKQIKLDSSAQFGYYFRV
TCKEEKVLRNNKNFSTVDIQKNGVKFTNSKLTSLN
EEYTKNKTEYEEAQDAIVKEIVNISSGYVEPMQTL
NDVLAQLDAVVSFAHVSNGAPVPYVRPAILEKGQG
RIILKASRHACVEVQDEIAFIPNDVYFEKDKQMFH
IITGPNMGGKSTYIRQTGVIVLMAQIGCFVPCESA
EVSIVDCILARVGAGDSQLKGVSTFMAEMLETASI
LRSATKDSLIIIDELGRGTSTYDGFGLAWAISEYI
ATKIGAFCMFATHFHELTALANQIPTVNNLHVTAL
TTEETLTMLYQVKKGVCDQSFGIHVAELANFPKHV
IECAKQKALELEEFQYIGESQGYDIMEPAAKKCYL
EREQGEKIIQEFLSKVKQMPFTEMSEENITIKLKQ
LKAEVIAKNNSFVNEIISRIKVTT
```
To use BLAST, copy this sequence and paste it into the big text box on the BLAST page. Select BLASTP from the pull down Program menu and select E. coli from the pull down Database menu. Then click on Search. On the next page, click on Format Results to see the results of your database search. Sometimes there is a wait to see results. If this is the case, you should see a message on your screen about this. After searching more than 4000 sequences, BLAST finds 3 hits worth reporting. Scroll down to the heading Distribution of 3 Blast Hits on the Query Sequence and the figure below it. Click on the thin red line. This takes you to the alignment of the human gene and the E. coli gene that is most similar. Notice that this is the DNA mismatch repair gene. Also notice how many identical amino acids there are between the proteins from these two distantly related species. Which region is the most similar?

A Brief Summary of what we have done investigating the genetics of a disease:

We searched the OMIM database for the disease "human colon cancer."
We read information about the genetics of this disease and saw how frequently the information is update. (It's important to visit these sites more than once to make sure that you're getting the most up-to-date information.)
We looked at some links with the OMIM article to the scientific literature and to a mutation database.
We looked at alignments of sequences in a variety of organisms.
We looked at mutations in the mismatch repair gene.
We did a database search to find sequences in E. coli that are similar to the sequence in humans.

*120435 COLON CANCER....

A Brief Summary of what we have done investigating the genetics of a disease: