Microarray analysis exercises 2 - with R

WIBR Microarray Analysis Course - 2007

Starting Data (all probeset data, means, and ratios)     Processed Data (with p-values reflecting differential expression)

Class 2 exercises

Part IV. Identifying differentially expressed genes

  1. Differentially expressed genes can be naively determined by fold changes but more effectively determined by using a statistic such as the t test.
  2. We'll compare the results of these two methods later in Part VI.
  3. Read an expression matrix file that we calculated before (or download it to the working directory):
    exprSet = read.delim("Su_mas5_matrix.txt")
    # Check how the chips are named
  4. Use the t test for one gene to determine if the data on fetal and adult expression are different in the brain and/or liver.
  5. Use the t test to test for a difference in means with all genes. Note the second argument is "1", showing that we want to apply the t-test command across rows (genes/probesets)
  6. brain.p.value.all.genes = apply(exprSet, 1, function(x) { t.test(x[1:2], x[3:4]) $p.value } )
    liver.p.value.all.genes = apply(exprSet, 1, function(x) { t.test(x[5:6], x[7:8]) $p.value } )
    # Check the first few brain ones to make sure the first one agrees with our single-gene command
  7. Use the "Absent/Present" calls from the Affymetrix algorithm to flag genes with questionable expression levels.
  8. Sort data and remove non-expressed probesets.
  9. Correct t-test p-values for multiple hypothesis testing by calculating the False Discovery Rate (FDR)
  10. List all the gene IDs for those that meet your significance threshold (such as raw p < 0.01) and are present in at least one sample.
  11. Use the Compare two lists tool to get the non-redundant union of these lists.

Part V. Clustering

  1. Use any or all of these data sets. The second dataset, being across more tissues, may be the most interesting. R can do clustering, but we prefer another pair of applications to create and view clustered matrices.
    1. your subset of log2-transformed expression ratios (ex: "brain.DE.log2.ratios.txt", from the end of Part IV). You may need to first open this file in a text editor and add "Probe[tab]" (a word followed by a tab) to the beginning of the first line.
    2. a full set of expression ratios (transformed to log base 2), with values compared to the mean across all tissues
  2. Double click Cluster 3.0, a clustering application that works on all operating systems. It's an enhanced version of the Eisen clustering program. See the manual for more information about the program.
  3. File > Open and select your file of expression data (one of the files in Part V.1).
  4. Note that there are some filtering and normalization functions on the tabs "Filter Data" and "Adjust Data", but we've already performed these steps.
  5. Try Hierarchical clustering using the default settings.
  6. Open JavaTreeView for visualizing your data as a heatmap.
  7. Try k-Means clustering using the default settings.
  8. Optional: While in JavaTreeView, try Export > Export to Postscript and save all or part of your figure. This will produce an image of optimal resolution. Otherwise, you may wish to export to GIF or bitmap (which are easier to handle in Photoshop, but lower resolution).
  9. Optional: Open the heatmap in Illustrator or Photoshop.

WIBR Microarray Analysis Course - 2007