Genome 560 Spring 2011 Statistics for Genomicists J. Felsenstein R exercise 4 Chi-square etc. In these exercises, look at each result. I've left those steps out. # We're going to computer-simulate data from a 2 x 2 table that # does not have the same proportions in all rows. # Draw 300 draws from four outcomes if probabilities 0.1, 0.2, 0.4, 0.3 a <- sample(c(1,2,3,4), 300, replace=TRUE, prob=c(0.10,0.20,0.40,0.30)) # (this means that "sample" draws 300 times, with replacement, with values # 1, 2, 3, or 4, and with probabilities 0.10, 0.20, 0.40, 0.30) # This looks like this: # 0.1 0.2 # 0.4 0.3 # so the ratios of columns are different in the two rows (1:2, 4:3) b <- tabulate(a) # make it into a vector of counts tb <- matrix(b, 2, 2) # then make this into a 2x2 table chisq.test(tb) # do a 2x2 chi-square test fisher.test(tb) # is it much different? # what happens to the above if you do a smaller number of draws? # You can do that without a lot of retyping by using your up-arrow key # to scroll back through the commands, then use your left-arrow key to # go back to a character and use the delete key to delete it and then type # in what it should be. # Now we're going to simulate data from a Poisson distribution d <- rpois(200, 3.0) # 200 draws from a Poisson distribution with mean 3.0 ob <- tabulate(d+1) # count them up (have to add 1 as it ignores zeros!) # (annoying: must add 1 to each so none are zero!) q <- dpois(0:(length(ob)-2),3.5) # produce class probabilities for a # Poisson distribution with one fewer class, with mean # 3.5 so these are somewhat wrong probabilities as we # know the expectation is really 3.0 q <- c(q, 1-sum(q)) # Add on the last class after with the rest of the prob. chisq.test(ob,p=q) # Do a goodness-of-fit chi square test with probs q ex <- 200*q # get the expected numbers sum((ob-ex)^2/ex) # that's a way of computing the chi-square by yourself! # what is it doing? Did it get the right number? # can you try some other values of the mean of the Poisson (other than 4) # and do the test for those? Use you up-arrow key to recall previous # commands and edit them when doing this (there are just 3 commands to # repeat). What turns out to be the minimum-chi-square estimate of the mean? # can you also do a plot of the difference between ob and ex, versus # the class number (which you can generate using a "seq( ... )" command)? # this would be a reasonable way to diagnose where the lack of fit is.