WI-Engineering Home Page

Congratulations, your rich uncle has left you $2 billion to start a new biotech company. As CEO of the company you are solely responsible for the direction of research. After hearing about the recent cloning of Leptin, a protein secreted by fat cells which causes mice to feel satiated, you decide to identify and clone novel proteins in human serum. You have your best biochemists working on the project night and day. They identify a protein by 2-dimensional electrophoresis which is found in mouse serum after feeding. They purify the protein, digest it with proteases, and send some of the peptides off for sequencing. Unfortunately, only one of the peptides provides any sort of reasonable sequence. You have a meeting with your major shareholders tomorrow (you took the company public soon after starting it) and need to report something interesting or the stock is likely to plummet.

Let's see where twelve amino acids can take us....

The sequence from the peptide is: NGLSPETRRLVR

(anything in blue/purple and underlined is "hotlinked"-click on it and go)

Take this sequence and search the databases against known protein sequences to see if you can identify the protein that this sequnce is from. Do this by using the Blast algorithm developed at the National Center for Biotechnology Information. Select BLASTP from the pull down Program menu and swissprot from the pull down Database menu. (SwissProt is a database of protein sequences.). Copy the sequence above and paste it into the big text box on the BLAST page. Just below where you entered the sequence is a box for "View results in a separate window". Click on the box to remove the check. Then click on Search.
After pressing Search, you will get a confirmation that your search is submitted. Click on the "Format results" button to view the results of the BLAST search. When doing a database query, results are presented in terms of sequence alignments between the input sequence and the database sequence. The strength of the alignment, measured by the alignment score, indicates those sequences most closely related. What did you get? Look at the sequence alignment to see what sequence is identical to your query sequence.

On the result page, click on the links to get additional information about this sequence. Clicking on the entry name will link you to the Entrez page. From here you will see some links to gi entries. These links are crossreferences to the GenBank database, a database of nucleotide sequences. Click on the first gi link to get the nucleotide sequence. This sequence should be of length 606 bp. You will use this nucleotide sequence to search for homologs of this gene in other species. Scroll to the bottom of the page and select and copy the nucleotide sequence (make sure that you don't select the protein sequence. The numbers will be ignorned when you do your search.)

Note: There is lots of additional information in this page. You can look at other views of the protein (e.g. Graphics) and there are links to the literature that you may want to explore.
Now, let's search for homologs of this gene. Once you've copied the nucleotide sequence, you can now do another Blast search. This time do a BLASTN search (for nucleotide sequences) against the NR (non-redundant) database. Paste your sequence (click on the query box and paste). Just below where you entered the sequence is a box for "View results in a separate window". Click on the box to remove the check and then click on search to submit your query.
What did you find? The graphic at the top of the page summarizes your alignments. First you'll see a color key for the alignment scores and then you'll see the alignments. The ones in red are the strongest; the black ones are the weakest. The alignment score is based on the similarity at each position in the two sequences. The higher the score, the more similar the sequences. Mouse-over the graphic to show the definition line and scores. Click to show alignments

Scroll further down in the output. Here you'll see output giving entry names, alignment scores and probabilities. Those entries with small probabilities (at the top of the list) are the most likely to be related to your sequence. The probabilities indicate the chance of seeing a given score purely by chance. Select the human homolog (it will have "Homo sapiens" somewhere in the description). Scroll down the page and select and copy the translated protein sequence to use later.
Look at some of the other sequences and the information in those links. ............While you were doing that, your assistant copied the sequences from GenBank and constructed an alignment of the proteins. You can view that alignment in Meg Align (red boxes are identical amino acids with your protein and blue boxes indicate conservative changes, click on upper right corner to enlarge figure and click back here to continue with the program). The program has also constructed a phylogentic tree which shows how the proteins are related evolutionarily. Branches signify points of evolutionary divergence, i.e. a sequence(protein or nucleotide) which branches off from another sequence very early(farther to the left) is more distant evolutionarily than a protein with a much more recent branch.
Notice how close mouse and rat are. What is the human protein most similar to?
Although it is running late and you're late for dinner, you decide to do one last search. This time you blast the human protein sequence against the databases. Use the Blast search program again. If the text block still contains data, click on the Clear Input button on the top left of the page. This time you will do a BLASTP search against the NR database. Paste your sequence. Just below where you entered the sequence is a box for "View results in a separate window". Click on the box to remove the check. Then click on Search.
Scroll through the hits a bit. What did you find?

Why did you miss all of these proteins the first time around?
Your efficient assistant has taken some of the sequences and assembled an alignment in MegAlign.(blue boxes are identical amino acids with your protein, green boxes indicate very conservative changes, while red boxes are less conservative changes). Your soon to be promoted assistant has also compiled the family tree using the same program. How conserved are these sequences at the amino acid level?
You are very excited now that you have some interesting data to show the shareholders. But something in the back of your head asks if there is some reason why these very divergent proteins are related. You decide to do a search of the Prosite pattern database. Paste in the protein sequence(the human rbp which should still be in the computer's cut/paste buffer) in the form, select the "exclude patterns..." option, and click on Starts the Scan.
What happened?

To find out what you have, select "PD0C00187" it should take you to a largely text page. After you've looked at that for a bit, select "PS00213" at the top of the page. This page contains hot links to all of the protein members of the family.
You are extremely fortunate in that 3D structures for some of the family members have been solved. Your new vice-president of research(formerly your trusty assistant) has taken two of the structures and downloaded the graphical representations.
What do you see? If you take a moment to look back at the family tree, you notice that the two proteins are evolutionarily very far apart. What do you think now that you see the 3-D structure?

So, it is late and you decide that you can now go home. You have some very interesting data to show the shareholders and the future of your company is safe for now. On your way home you start to think about how much information you were able to obtain with

12 amino acids and your computer.

Let's review where we started and where we went.

1. Twelve amino acids.

2. Finding the mouse gene.

3. Using the mouse nucleotide sequence to pull out the same gene from other species.

4. Using the human protein sequence to pull out other related proteins.

5. Utilizing the Prosite database to find out that these proteins were all from the same family.

6. Comparison of the 3D structures of two distantly related family members.

Given that the structures of ApoD and RBP look very similar, what does this tell you about how much information can be obtained from the nucleotide or amino acid sequence of a protein? How important is structure?

If you have some time you may want to:

play with the RBP stuctures interactively

(Note: To look at the structures, click on Quick PDB (or Ribbons or Cylinders for static views). If you install the Rasmol 2.6 software that is publicly available, you will have more options to view the structures interactively. Select the rasmol from the PDB results page. This will download a structure file to view in Rasmol. Try using the ribbons, spacefill, and backbone options under "display." Also try the structure option under "colour." )

look at some other pre-selected structures (after selecting each, scroll down to molecule visualization and select "asymmetric unit").

1. more lipocalin family members

Mouse urinary protein-functions as a pheromone in mice, notice the structural similarity to the other family members.

Bilin binding protein-this is a butterfly protein which is a member of the lipocalin family. Select chain under the color option to see that this is a tetramer.

2. other genes of interest (these will open up directly into the structure, if you don't see anything, go to the "window" option on the menu bar and select "main")

p53 complexed with DNA-try using the chain mode with the space filling option under "display" to see the DNA in the groove

Ras oncogene-this gene is mutant in a great many human cancers and was cloned by Bob Weinberg's group here at the Whitehead.

Human Class I histocompatibility antigen complexed with a nonameric peptide from HIV-1 GP 120 envelope- try using the chain mode with the space filling molecule view to view the peptide. A similar structure with the Class II molecule answered some basic questions in immunology about how immune cells tell self from non-self.

Influenza Virus Hemagglutinin- this molecule shows the pH induced conformational change which is probably responsible for the fusogenic nature of the protein. Peter Kim here at the Whitehead is an expert in this field. His lab recently crystallized an HIV protein which may use a similar mechanism.

3. find your own genes

pulling up new structures this is the home page for looking up 3D structures. Enter text of the proteins you are interested in and go from there.