Microarray analysis exercises 3
Class 3 exercises
Part VI. Visualizing all the data
- You may try any combination of these graphing techniques. Global visualization of an experiment can be helpful for designing subsequent analysis and for quality control.
- Intensity scatterplot (sample)
- Go to the "means" sheet and select two columns for either brain or liver.
- Click on the Chart Wizard and for Chart Type, select "XY(Scatter)".
- Click on "Next >" twice and label the chart and axes.
- Click on "Finish".
- To make the axes logarithmic, click on an axis to get the Format Axis box.
- Select the Scale tab and check the "Logarithmic scale" box.
- Repeat for the other axis.
- R-I (M-A) plot (sample) (other samples)
- A "ratio-intensity plot" looks like a scatterplot that has been rotated 45 degrees.
- It's the most common plot associated with lowess normalization.
- After chooosing the two columns of data you wish to compare, begin by calculating two new values for each expression value using the Excel "LOG" function, like =LOG(B2*C2,2) where the last argument is the base.
- I = log2(experiment 1 * experiment 2)
- R = log2(experiment 1 / experiment 2)
- Follow the same instructions as for a scatterplot (above).
- volcano plot (sample)
- These plots will help compare two methods for determination of differential expression: fold changes and t tests.
- The x-axis is the ratio between two tissues and the y-axis is the p-value from the t test from the same two tissues.
- Start by choosing a tissue and copying the log2 ratio data and t test data into two adjacent columns.
- Replace any non-numerical characters (if present) with 1.
- Click on the Chart Wizard and for Chart Type, select "XY(Scatter)", and go through the wizard as before.
- Once you have a graph, click on the y-axis to get the Format Axis box.
- Select the Scale tab and check the "Logarithmic scale" and the "Values in reverse order" boxes.
- Click on the x-axis to get the Format Axis box.
- Select the Patterns tab and select "High" for tick mark labels.
Part VII. Functional analysis
- Annotation (Excel and web tools)
- Download the Excel annotation tool for the Affymetrix U95 chip used in this experiment.
- This Excel file was created using an annotation file from Affymetrix and a simple use of the "VLOOKUP" function.
- Go the "list" sheet and paste in a list of Affymetrix IDs showing, for example, your list of genes with differential expression in the fetal vs. adult brain.
- Use F9 (if required) to calculate the VLOOKUP functions.
- Optional: browse the list of genes to find any of your favorites.
- Comparing two lists
- Are there any genes which are differentially expressed in both brain and liver?
- Looking at your selected set of genes (those differentially expressed in brain and/or liver), compare
- genes that are expressed at a different level in the fetal liver and the adult liver
- genes that are expressed at a different level in the fetal brain and the adult brain
- Go the Compare two lists tool and paste in both lists.
- What genes are in the intersection of both lists?
- Record the following three numbers:
- number of genes differentially expressed only in the fetal liver
- number of genes differentially expressed only in the fetal brain
- number of genes differentially expressed in both the fetal liver and brain (the intersection of the two original lists)
- Use the Venn diagram generator to draw a figure of these data (or use {55, 74, 98}).
- If your browser isn't configured to view SVG graphics, save the SVG file as text and choose a file name ending in .svg
- If you want to save the image, right click on it and select "Save SVG As".
- The .svg file can then be opened and edited in Illustrator.
- Genome mapping of one gene or a set of genes
- Go to the annotation file from part VII.1 and select the symbol of an interesting gene from the list.
- Go to the UCSC Human Genome Browser Gateway, enter the gene symbol in the "position" field, and click "submit."
- On the new page, use a link to a RefSeq Gene (if you hit one) or a Known Gene. Multiple hits at this stage can be the result of multiple transcripts.
- The browser can show lots of different types of data and get you to genomic sequence. Ask if you'd like to know more.
- To map a set of genes, go to the annotation file from part VII.1 and select the entries in the GeneSymbol column.
- Go to a tool like the WIBR human genome mapper and paste your gene symbols
(or you can input up to three gene sets at once).
- Click on MAP to see if any of your genes map to your favorite parts of the genome or if they appear to be clustered at any particular loci.
- Promoter extraction
- Since gene expression is regulated in part via the binding of transcription factors to gene promoters, you may want to get some promoter sequence.
- Different tools can be used to extract promoter sequence (in addition to using a genome browser from the last step).
- Go to the annotation file from part VII.1 and select the RefSeq Transcript ID for some interesting genes.
- You may need to search and replace any characters that aren't gene IDs.
- Go to the RefSeq promoter extractor and paste a list of RefSeq IDs (NM_...).
- Select the length of sequence to define your "promoters" (or use the defaults) and get the genomic sequence.
- Save the promoter sequences as a text file (or copy into a text editor) to use for subsequent analyses.
- Identifying potential transcription factor binding sites with TRANSFAC
- Go to TRANSFAC and use your BaRC username and password for your tak account.
- To search for potential transcription factor binding sites in any promoters, click on MATCH on the left column.
- Paste your promoter(s) into the big box.
- Select "vertebrates" under "Group of matrices".
- Note the "Cut-off selection for matrix group" that can be adjusted for the rate of false positives and false negatives.
- Click on "Submit the form".
- Look at the output: does it make sense?
- Gene Ontology analysis
- Given a list of genes, how can we figure out what functions the genes have in common, or more precisely, what functions are over-represented in the gene set?
- Gene Ontology (GO) annotation provides information to effectively answer this question using one of many available tools.
- Go to DAVID and click on "Upload New List" under DAVID tools.
- Paste one of your gene lists, select AFFYID, and "Submit Text".
- On the next page, click on GOcharts.
- Select one (or more) of the three ontologies under CLASSIFICATION TYPE and click on "Chart Values!"
- Note that some terms are too general and some too specific to be informative, but those in between should tell something about what's special about the genes in your list
(if there is something special).
- To find how much any of these GO terms occur more often than you'd expect in a subset of the genes on the Affymetrix U95 chip,
select EASEonline under DAVID tools.
- Under SELECT BACKGROUND LIST, choose U95A and click on Submit.
- On the next page, 4 numbers are associated with each Category:
- LH (list hits): number of genes with this GO term in your gene list
- LT (list total): number of genes in your gene list mapped to any term in this ontology ("system")
- PH (population hits): number of genes with this GO term on the background list (the whole chip)
- PT (population total): number of genes on the background list (the whole chip) mapped to any term in this ontology ("system")
- The EASE Score shows the level of confidence that this term is over-represented in your gene list.
- Pathway analysis (KEGG)
- We can try to map a gene set to known pathways using a database such as KEGG.
- While in DAVID (from the last step), select KEGGCharts under DAVID Tools and click on "Chart Pathways!"
- Following any of the pathway links takes you to a KEGG pathway, with the proteins from your genes colored red.
- Motif finding (Meme)
- Since we can get promoters from a set of co-regulated genes, we can try to identify over-represented motifs that may act as transcription factor binding sites.
- Go to Meme, a popular tool for this type of analysis, and paste the promoters of several
co-expressed genes. Note that Meme takes a lot of computing power, so you may have problems with more than 10 sequences at once.
- Enter your email address and click on Start Search.
- A response can take hours or days, so don't hold your breath.
- See the sample Meme output to get an idea of the output to expect. Note that only some "motifs" may be biologically menaningful.
- Comparisons to other expression data
- How does this experiment compare to other experiments? What type of expression patterns do your interesting genes show in other experiments?
- Go to a repository like GEO.
- To look at one gene in detail, enter the gene name or Affymetrix ID next to "Gene profiles" and click on GO.
- To look for a type of experiment, enter a term (ex: embryo) next to DataSets and click on GO.
- Click on a GDS (Geo Dataset) to go to a new page with a link to download the dataset.
- Click on a GSM (Geo Sample) to get more information and expression data from a specific chip/hybridization.
WIBR Microarray Analysis Course 2004