Goals for class 1 exercises: Use descriptive statistics and figures to describe data with Excel and R.
See how a series of commands can be used to create an R pipeline in a script file. Note that the bold terms
are the commands for Excel or R. You'll start the exercises by reading data from a text file.
Links to data for download are in double square brackets [[ like this ]]. Links to other pages are in double curly brackets {{ like this }}. |
Start R on your desktop or laptop by clicking on the icon. Alternatively, you may also connect to R on tak |
Task | Using Excel | Using R | |
1: Select (right click) and download data about
[[ Energy expenditure by lean and obese women ]] to your desktop.
If that isn't available, you can copy the data from the bottom of page, paste into a text editor, and save the file as plain text on your desktop. |
Open the tab-delimited text file energy.txt you should have downloaded (from the "Task" column at left) or paste the data from [[ energy.html ]] into Excel. | After starting R, move R to the directory you want by using the pull-down menu:
For Windows: File >> Change dir >> Browse >> [Browse to Desktop] >> OK For Mac: Misc >> Change dir >> Browse >> [Browse to Desktop] >> OK If you don't want to use the pull-down menu, you can move the data file into working directory ("wd"). To find the working directory, use the commandgetwd() Then you can change (set) the working directory to match the location of data with a command like setwd("C:/Documents and Settings/student/Desktop")or setwd("/Users/student/Desktop")assuming "student" is your laptop username. Then read the file by typing a command like energy = read.delim("energy.txt", header=T)(and hitting "enter"). To see the "energy" dataset, either type energyor edit(energy)With the "edit" view, if fields are changed, energy actually stays the same. To change the dataset (not now!) use a command like "fix(energy)". | |
2: Get the mean and median of each column of data | Use formulas like
=AVERAGE(A2:A14)and =MEDIAN(A2:A14) |
Use commands like
mean(energy$lean) mean(energy[,"lean"]) mean(energy[,1])which are different ways of describing the "lean" column, which is in column 1. If you have any "NA" values (type energy$leanto see), you may need mean(energy$lean, na.rm=T)which removes the "NA" fields. Then similarly get the median with median(energy$lean)or simply summary(energy) |
|
3: Get the standard deviation of each column of data | Use a formula like
=STDEV(A2:A14) |
Use a command like sd(energy$lean) |
|
4: View the distribution of data with a histogram and a q-q plot. | Go to pull-down menu: Tools >> Data Analysis. If this option is not present, you need to add or install the Analysis ToolPak. Go to Tools >> Add-Ins and check the box. Before going further, create a "bin range" in another column of your Excel file. Perusing your data, decide the size of the bins you'd like to represent on the histogram. For the energy data, one possible choice is 6 through 11 in intervals of 0.5 units. So make a column containing 6, 6.5, 7, ..., 11. Select Histogram, enter or select the Input Range and the Bin Range, check "Chart Output", and click "OK". Q-Q plots aren't very straightforward in Excel. |
Use a command like
hist(energy$lean, breaks=20)where 'breaks' describes how many bins to create. Type ?histto see all the options. You can get help on any command by typing a question mark before the command. To change the color of the bars, for example, type hist(energy$lean, breaks=20, col="green")To see how normal the data are, try a q-q plot: qqnorm(energy$lean, pch=19, cex=2) qqline(energy$lean, lwd=2, col="blue")Optional arguments used: Points that are normally distributed appear along the line, which passes through the first and third quartiles (25th and 75th percentiles). | |
5: Draw a scatterplot showing all the data | One method is to reformat to the data, with group number (1 or 2) in the first column and the data for that group in the second column. This should look like [[ energy_by_group.html ]]. Click on the Chart Wizard, select 'XY (Scatter)' and click on "Finish". Then you may wish to clear the x-axis and add the two group names as text boxes. | Use a command like
stripchart(energy, vertical=T, method="jitter")where 'jitter' does that to the points so they don't overlap so much. Type ?stripchartto see all the options. | |
5: Draw a boxplot | Boxplots are not possible in Excel -- unless you want to use lots of hacks. |
Use a command like
boxplot(energy)To add some more options, type boxplot(energy, col=c("red", "blue"), main="This is a boxplot in R", xlab="Exp groups", ylab="Units go here") | |
6: Run a script in R |
Given the script below, there are at least three ways to run it:
1: Save [[ this file ]] (also shown below in the colored box) into your "working directory" [run getwd() to find out], calling it something like script1.R The ".R" at the end isn't necessary but helpful to label it as an R script. Then, within R, type source("script1.R") 2: Copy and paste the commands into your R terminal. This can be done one line at a time or all at once 3: (For tak and other Unix systems): From the Unix prompt [without logging in to R already], type ./script1.R > script1.R.out.txtAny output will be saved in "script1.R.out.txt". If you get a "Permission denied" error, you need to make the script "executable" with a command like chmod +x script1.RThen you can run the previous command. This is a quick way to run a script you know already works, but it's not a good way to debug and modify the commands.
After running the script, you should find three files (two PDF files and a table) in the working directory. If you're on tak, after exiting from R (see next step), you can view the PDF files using the Unix command /nfs/BaRC/apps/adobe/bin/acroread myfile.pdfwhere 'acroread' is the Unix command to open Acrobat Reader. | ||
7: Exit from applications | You may wish to save your file and/or email it to yourself for reference. |
To get all of your work (input and text output) look for the "Save to file" choice on the menu.
If you wish to see your previous commands, type history()or see the history using the history button (Mac only). You can also save the "history" output (during or at the end of your work) with a command like savehistory(file = "R_history_class1.R") To exit from R, use the pull-down menu or type q()for "quit". When you exit R, you'll be asked "Save workspace image?" If you answer "yes", when you re-start R, the same commands you performed this session will be performed -- except packages (like you'll use later) will need to be loaded. |
Data for exercises above: energy expenditure in groups of lean and obese women (tab-delimited text) |
|