## Genome 560

### Statistical Genomics (actually Statistics for Genomicists)

#### Spring, 2009

These news items have the newest ones last.

## A rough syllabus (to be improved)

1. (May 5) Probability. Stochastic processes (coins, phone calls, normals) Distributions (uniform, binomial, geometric, exponential, Poisson, normal, lognormal)
2. (May 7) Distributions, cont'd. Histograms, etc. Quantiles, distributions of functions of (multiples, averages, sums, sums of squares, differences) Practice R
3. (May 12) Confidence intervals, t-test, experimental design, tests
4. (May 14) Chi-squares, contingency tables
5. (May 19) Regression, curve fitting, ANOVA, F-test
6. (May 21) Bayesian inference, likelihood
7. (May 26) Jackknife, bootstrap, permutation tests, cross-validation
8. (May 28) ANOVA, more on
9. (June 2) Multiple testing: Bonferroni, modifications of, FDR
10. (June 4) Principal (not ``principle'') components, SVD, etc.

## Lecture PDfs

The lecture PDFs will be posted here. Now available are:

## Books

There is no textbook for the course. Josh Akey, in last year's web pages, lists some books and a number of on-line statistics texts available free on the web. They are

In fact, a whole bunch of on-line statistics textbooks will be found if you Google: "online statistics text"

Josh's 2008 course web pages are excellent, especially his lecture PDFs. Although the order of material is different, they are very much work looking at. They are here

## The R language

R is a free interactive computer environment (in old-fashioned terms, an "interpreter") that can be used for many purposes. It was originally designed by statisticians (R is a clone of a language called S, which is now commercial). It has many built-in statistics functions, which is why we will use it. (At the main CRAN-R project site there are links to many other analysis packages that can be loaded into R).

R can be downloaded and installed on Windows, Mac OS X, or Linux machines (and some other types as well). It is available at the CRAN-R site here as executables, source code, and many other resources including a terse PDF introductory manual. When using this manual skip over parts that go too deeply into stuff you don't yet understand as there is valuable stuff after that. Come back to the skipped stuff later.

Is R great? For many things, yes. Is it good at everything? I would say that its array operations stand a good chance of driving the puzzled beginner absolutely bonkers, so no. In this it reminds me of a programming language called APL ("A Programming Language") which could do many things interactively, had fervent evangelists, was mostly about arrays, and drove me absolutely bonkers. But what do I know about it, anyway?

Josh Akey produced two quick introduction sheets:

## R in this course

We will do an R exercise in each class session. Students are expected to bring a laptop with R loaded on it (kudos to the present class for doing this successfully). I will distribute exercise sheets at each class, and we will try to do them. As I make them I will post them here.