Congratulations, your rich uncle has left you $2 billion to start a new biotech company. As CEO of the company you are solely responsible for the direction of research. After hearing about the recent cloning of Leptin, a protein secreted by fat cells which causes mice to feel satiated, you decide to identify and clone novel proteins in human serum. You have your best biochemists working on the project night and day. They identify a protein by 2-dimensional electrophoresis which is found in mouse serum after feeding. They purify the protein, digest it with proteases, and send some of the peptides off for sequencing. Unfortunately, only one of the peptides provides any sort of reasonable sequence. You have a meeting with your major shareholders tomorrow (you took the company public soon after starting it) and need to report something interesting or the stock is likely to plummet.
Let's see where twelve amino acids can take us....
The sequence from the peptide is: NGLSPETRRLVR
(anything in blue/purple and underlined is "hotlinked"-click on it and go)
After pressing Search, you will get a confirmation that your search is submitted. Click on the "Format results" button to view the results of the BLAST search. When doing a database query, results are presented in terms of sequence alignments between the input sequence and the database sequence. The strength of the alignment, measured by the alignment score, indicates those sequences most closely related. What did you get? Look at the sequence alignment to see what sequence is identical to your query sequence.
On the result page, click on the links to get additional information about this sequence. Clicking on the entry name will link you to the Entrez page. From here you will see some links to gi entries. These links are crossreferences to the GenBank database, a database of nucleotide sequences. Click on the first gi link to get the nucleotide sequence. This sequence should be of length 606 bp. You will use this nucleotide sequence to search for homologs of this gene in other species. Scroll to the bottom of the page and select and copy the nucleotide sequence (make sure that you don't select the protein sequence. The numbers will be ignorned when you do your search.)
Note: There is lots of additional information in this page. You can look at other views of the protein (e.g. Graphics) and there are links to the literature that you may want to explore.
What did you find? The graphic at the top of the page summarizes your alignments. First you'll see a color key for the alignment scores and then you'll see the alignments. The ones in red are the strongest; the black ones are the weakest. The alignment score is based on the similarity at each position in the two sequences. The higher the score, the more similar the sequences. Mouse-over the graphic to show the definition line and scores. Click to show alignments
Scroll further down in the output. Here you'll see output giving entry names, alignment scores and probabilities. Those entries with small probabilities (at the top of the list) are the most likely to be related to your sequence. The probabilities indicate the chance of seeing a given score purely by chance. Select the human homolog (it will have "Homo sapiens" somewhere in the description). Scroll down the page and select and copy the translated protein sequence to use later.
Notice how close mouse and rat are. What is the human protein most similar to?
Scroll through the hits a bit. What did you find?
Why did you miss all of these proteins the first time around?
What happened?
To find out what you have, select "PD0C00187" it should take you to a largely text page. After you've looked at that for a bit, select "PS00213" at the top of the page. This page contains hot links to all of the protein members of the family.
What do you see? If you take a moment to look back at the family tree, you notice that the two proteins are evolutionarily very far apart. What do you think now that you see the 3-D structure?
So, it is late and you decide that you can now go home. You have some very interesting data to show the shareholders and the future of your company is safe for now. On your way home you start to think about how much information you were able to obtain with
12 amino acids and your computer.
Let's review where we started and where we went.
1. Twelve amino acids.
2. Finding the mouse gene.
3. Using the mouse nucleotide sequence to pull out the same gene from other species.
4. Using the human protein sequence to pull out other related proteins.
5. Utilizing the Prosite database to find out that these proteins were all from the same family.
6. Comparison of the 3D structures of two distantly related family members.
Given that the structures of ApoD and RBP look very similar, what does this tell you about how much information can be obtained from the nucleotide or amino acid sequence of a protein? How important is structure?
If you have some time you may want to:
play with the RBP stuctures interactively
(Note: To look at the structures, click on Quick PDB (or Ribbons or Cylinders for static views). If you install the Rasmol 2.6 software that is publicly available, you will have more options to view the structures interactively. Select the rasmol from the PDB results page. This will download a structure file to view in Rasmol. Try using the ribbons, spacefill, and backbone options under "display." Also try the structure option under "colour." )
look at some other pre-selected structures (after selecting each, scroll down to molecule visualization and select "asymmetric unit").
1. more lipocalin family members
Mouse urinary protein-functions as a pheromone in mice, notice the structural similarity to the other family members.
Bilin binding protein-this is a butterfly protein which is a member of the lipocalin family. Select chain under the color option to see that this is a tetramer.
2. other genes of interest (these will open up directly into the structure, if you don't see anything, go to the "window" option on the menu bar and select "main")
p53 complexed with DNA-try using the chain mode with the space filling option under "display" to see the DNA in the groove
Ras oncogene-this gene is mutant in a great many human cancers and was cloned by Bob Weinberg's group here at the Whitehead.
Human Class I histocompatibility antigen complexed with a nonameric peptide from HIV-1 GP 120 envelope- try using the chain mode with the space filling molecule view to view the peptide. A similar structure with the Class II molecule answered some basic questions in immunology about how immune cells tell self from non-self.
Influenza Virus Hemagglutinin- this molecule shows the pH induced conformational change which is probably responsible for the fusogenic nature of the protein. Peter Kim here at the Whitehead is an expert in this field. His lab recently crystallized an HIV protein which may use a similar mechanism.
3. find your own genes
pulling up new structures this is the home page for looking up 3D structures. Enter text of the proteins you are interested in and go from there.