Genome 560 Spring 2011
Statistics for Genomicists J. Felsenstein
R exercise 3
T tests and such
Read in the "data frame" for the gene expression values, then copy its first
100 genes for two individuals into two vectors. One is a set of measurements
for a person of European ancestry, the other an individual of Yoruba ancestry.
The data frame "RMA_Filtered.txt" will be found at
http://evolution.gs.washington.edu/gs560/2011/datasets
(or by clicking the link in the course web page) and can be downloaded, if
necessary by showing it in your browser window and then using the Save As
function of the File menu, saving it as file "RMA_Filtered.txt"
These data come from this paper: Storey J.D., J. Madeoy, J. L. Strout,
M. Wurfel, J. Ronald, and J. M. Akey. 2007. Gene expression variation within
and among human populations. American Journal of Human Genetics, 80: 502-509.
Then we pull out the first 100 numbers in two of the columns:
a <- read.table("RMA_Filtered.txt")
eur <- as.numeric(as.vector(a[2:101,11])) # to treat them as numbers,
afr <- as.numeric(as.vector(a[2:101,20])) # ... not as character strings.
# do a one-sample t-test on eur, and one on afr:
t.test(eur) # what does the P value test?
t.test(afr)
# do a 2-sample t-test with assumed equal variances
t.test(eur, afr, var.equal=TRUE) # are means significantly different?
# do a paired t-test
t.test(eur, afr, paired=TRUE) # are means significantly different?
# do one that tests whether mean of eur is less than mean of afr
t.test(eur, afr, alternative="less")
# do one that is like that, but paired too
t.test(eur, afr, paired=TRUE, alternative="less")
# can you do versions of these on the logs of the values? The exp's?
# Hint -- log(eur) takes the function "log" is taken of each entry
# can you plot eur versus afr? What does this tell you?
plot(eur, afr)
# also look at boxplots side by side
boxplot(eur, afr) # what about taking logs first?
# Can you do plots for logs, exps, sqrt? Squares?
# Can you plot the ratio of afr to eur against the average of these two?
# which of the above tests seems like it was most appropriate?