Introduction to Statistical and Computational Genomics

GENOME 559
Department of Genome Sciences
University of Washington School of Medicine


Course description

Rudiments of statistical and computational genomics. Emphasis on basic probability and statistics, and an introduction to computer programming. This course is intended to introduce students with non-computer science backgrounds to the major concepts of programming and statistics.

Learning objectives

After taking this course, students will be able to describe and perform basic analysis tasks relating to biological sequence analysis, phylogenetics, pedigree analysis, genetic association studies, population genetics and microarray analysis. Students will be able to demonstrate an understanding of fundamental statistical concepts, such as p-values, t-tests, chi-squared tests and multiple testing correction. Finally, students will be able to write computer programs to perform statistical and bioinformatics analyses.

Instructional staff

Instructor: Mary Kuhner
Email: mkkuhner@gs.washington.edu

Instructor: William Stafford Noble
Email: noble@gs.washington.edu

Instructor: Bruce Weir
Email: bsweir@u.washington.edu

Meeting times and locations

Tue/Thu 3:30-4:50 pm in Hitchcock 220

The class meets in a computer lab and will involve writing computer programs during class time.

Prerequisites

Substantial background in molecular and cellular biology, genetics, biochemistry or related disciplines.

Course materials

Bioinformatics: Sequence and Genome Analysis by Mount. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2004. Second edition.
Learning Python by Lutz. O'Reilly, 2007. Third edition.

Course requirements

Students will complete eight homework assignments during the course. Assignments will typically involve some written questions and some programming problems.

Examinations

The final exam will be open book, and will cover the entire quarter. The final exam is scheduled for Thursday, March 20, 4:30-6:20 pm in Hitchcock 220.

Course grade

10% for each homework assignment, and 20% for the final exam.

Class schedule

Lecture Instructor Lecture topic Concepts Programming topic Reading Homework
Tue Jan 8 Noble Sequence comparison: Introduction and motivation Substitution matrices, gap penalties Introduction to Python    
Thu Jan 10 Noble Sequence comparison: Dynamic programming Dynamic programming, Needleman-Wunsch Strings Mount: ch. 3; Lutz: ch. 1-4, 7 HW1 assigned
Tue Jan 15 Noble Sequence comparison: More dynamic programming   Numbers, lists and tuples Mount: ch. 6; Lutz: ch. 5, 8  
Thu Jan 17 Noble Sequence comparison: Local alignment Smith-Waterman File I/O, if-then-else Lutz: ch. 9-12 HW1 due
HW2 assigned
Tue Jan 22 Noble Sequence comparison: Significance of similarity scores distribution, p-value, extreme value distribution for loops Mount: ch. 4; Lutz: ch. 13  
Thu Jan 24 Kuhner Phylogeny: Parsimony heuristic search, "assumption-free" methods while loops, modules (large.txt, small.txt, averagefasta.py, countfasta.py, fasta2matrix.py, fastaids.py) Mount: 292-294 HW2 due
Tue Jan 29 Kuhner Phylogeny: Distance methods least squares More on loops Mount: 301-317 HW3 assigned,(dna.txt)
Thu Jan 31 Kuhner Phylogeny: Likelihood methods maximum likelihood Dictionaries Lutz: 103-107, Mount: 317-320  
Tue Feb 5 Kuhner Phylogeny: Bayesian methods and MCMC Bayes' Theorem, Markov chains Defining functions Lutz: ch12 HW3 due
HW4 assigned, (infile.txt)
Thu Feb 7 Kuhner Phylogeny: Validating phylogenies, (bootstrapdata.txt likelihood ratio test, bootstrap Sorting Mount 321-322  
Tue Feb 12 Kuhner Pedigree analysis: Probabilities of genes on pedigrees LOD score Regular expressions Lutz: 447-452 HW4 due
HW5 assigned
Thu Feb 14 Kuhner Pedigree analysis: Additional methods ascertainment bias Objects and classes (part 1) Lutz: ch. 19-20  
Tue Feb 19 Kuhner QTLs linkage disequilibrium Objects and classes (part 2) Lutz: ch. 21 HW5 due
HW6 assigned,(sibs.txt)
Thu Feb 21 Kuhner Association studies: Detecting association between a trait and a gene chi-square, Bonferroni correction, relative risk Review: code.py, codon_rna.txt, codon_dna.txt, dna.txt    
Tue Feb 26 Weir Population genetics: Categorical data analysis chi-square and multinomial distributions, testing in population genetics     HW7 assigned
Thu Feb 26 Kuhner Association studies: Avoiding pitfalls in association studies bias, correlation vs. causation Biopython and exceptions   HW6 due
Tue Mar 4 Weir Population genetics: Genetic Diversity Statistical approaches for quantifying DNA sequence variation; Expectation of a random variable      
Thu Mar 6 Weir Whole genome association Linear regression     HW7 due
HW8 assigned
Tue Mar 11 Weir Whole genome association t-test, p-value      
Thu Mar 13 Weir Whole genome association family-wise error rate, false discovery rate     HW8 due