Introduction to Computational Molecular Biology: Molecular Evolution
GENOME 541
Department of Genome Sciences
University of Washington
Spring Quarter, 2010
Course description:
This is the second quarter of a two-quarter introduction to protein and DNA sequence analysis and molecular evolution, including probabilistic models of sequences and of sequence evolution, computational gene identification, pairwise sequence comparison and alignment (algorithms and statistical issues), multiple sequence alignment and evolutionary tree construction, comparative genomics, and protein sequence/structure relationships. These are the central computational methods required to determine the "periodic table of biology," i.e., the list of proteins and their evolutionary relationships, which can be regarded as the first stage in the growth of molecular biology into a quantitative science. Moreover, the statistical and algorithmic methods used (which include maximum likelihood estimation, hidden Markov models and dynamic programming) have wide applicability in other areas of computational and mathematical biology.
Instructional staff
Instructor (and course coordinator for 2010): Joe Felsenstein
Email: joe (at) gs.washington.edu
Office: Foege S420BInstructor: Larry Ruzzo
Email: ruzzo (at) cs.washington.edu
Office: Paul Allen Center 554
(Larry's course materials will be posted on his course web site).Instructor: Martin Tompa
Email: tompa (at) cs.washington.edu
Office: Paul Allen Center 538
Instructor: Phil Bradley
Email: pbradley (at) fhcrc.org
(Phil Bradley's course materials will be posted on his course web site).Instructor: Elhanan Borenstein
Email: elbo (at) uw.edu
Office: Foege S103B
Instructor: Su-In Lee
Email: suinlee (at) cs.washington.eduMeeting times and locations
Tuesday and Thursday, 10:30 - 11:50 am, Foege Building S110.
From upper campus, either:
- Walk across the pedestrian bridge next to Kincaid Hall (the westernmost one across Pacific Avenue). Go straight ahead, descend to ground level. When you come to the walkway that emerges from the J wing of Health Sciences Building, just past Hitchcock Hall, turn right. Foege Building is straight ahead. Enter the left (downhill) half of the building. The lecture room (S110) is the first room on your left, and can be entered at its far end. Or ...
- Come down 15th avenue, cross Pacific Avenue. Foege Building is straight ahead of you. Walk alongside it on 15th Avenue until you come to the gap in the middle of the building. Go through it and enter the downhill half of the building (on your right). The lecture room (S110) is the first room on your left, and can be entered at its far end.
Prerequisites
GENOME 540 or permission of instructor.
Students must be able to write computer programs for data analysis. Some prior exposure to probability, statistics and molecular biology is highly desirable.
Course materials
Required: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, S. Eddy, A. Krogh, G. Mitchison; Cambridge University Press, 1998. ISBN: 0521629713.
Required: Statistical Methods in Bioinformatics : An Introduction (Statistics for Biology and Health) by Warren J. Ewens, Gregory R. Grant; Springer, 2005. ISBN: 0387400826.
Course requirements
- The entire course grade is based on the homework assignments, which are due weekly (more or less). No tests or exams.
- The homework assignments involve writing programs for data analysis, and running them on a computer that you have access to (we cannot provide computers). We don't require a specific language, since it is not practical to grade your code, just the output from running your programs.
- Homework is due by 11:59 pm on the indicated date. After that it will be accepted, but penalized. Specifically, each assignment is worth 100 points, from which 10 points will be deducted for each day (or fraction thereof) that you turn it in late. The maximum deduction for being late is 60 points (even if you are more than 6 days late). If you get less than 40 points on an assignment, you are allowed to redo it and take the new score (which will be 40, i.e. 100 - 60, if there are no mistakes).
- It is OK to run your program on someone else's input data file, and compare outputs to see if you get the same results. However it is not OK to share programs, or to get someone else to debug your program. A key part of the course is being able to write and debug your own programs for data analysis.
Examinations
None.
Course grade
10% for each homework assignment.
Homework assignments
- Homework 1 (PDF) (due 4/8/2010). The tree will be found (in a text file) at homework1.tre.
- Homework 2 (HTML) (due 4/18/2010).
- Homework 3 (PDF) (due 4/25/2010). The data set will be found (in a text file) at primates.dna.
- Homework 4 (PDF) (due 5/2/2010). The data set is the same as for Homework 3.
- Homework 5 (PDF) (due 5/9/2010).
Home page
The course home page can be found at http://evolution.gs.washington.edu/gs541/2010.
News
Class news will be posted here, most recent messages first.
- Homework no. 2, assigned by Martin Tompa, is now available. See the links on this web page under “Homework assignments”.
- Martin Tompa's lecture slides are now available at the course web site. Click on the title of his lectures. The link is to a PowerPoint presentation. (On my computer, it downloaded rather than displaying on screen, and I then had to find it and bring it up in PowerPoint. OpenOffice Impres should work too.)
- I posted a message to the students, announcing the first homework. Unfortunately this message got held by the mailing list system! I have finally re-sent it today. Let's make the due date be April 11, not April 8, because of this problem.
- There is now am mailing list for the course, which will be found at this link which will show you the past messages.
Class schedule
Contents of this table are tentative. If PDF or PPT files of the lecture projection are available, these can be found by using the lecture title as a link. Some lecture projections shown are from 2009. These are labeled “(old)” and will be replaced by a current version when that is available. Some of the lecture projections are 2010 versions, but may be revised further before the lecture. They are labeled “(provisional)”. Audio recordings will be available here of some lectures. They will be available as .WMA or .MP3 files recorded at medium quality, about 10 Mb per file.
Date Instructor Topic (and link to projection file) Reading Homework Audio recording Tue Mar 30 Felsenstein Trees, parsimony, compatibility Ewens 497-499, 511-512, 517-521; Durbin 160-163, 173-176, 188-189 WMA
MP3Thu Apr 1 Felsenstein (remaining part of previous lecture's projections, plus ...)
Tree space, searching tree spaceEwens 511-512; Durbin 163-165, 176-179 HW1 to be assigned WMA
MP3Tue Apr 6 Tompa Comparative sequence analysis and phylogenetic footprinting (Zweig et al.)
(Siepel et al.)
(Blanchette and Tompa)
(Neph and Tompa)
(Prakash and Tompa)Thu Apr 8 Tompa Comparative sequence analysis and phylogenetic footprinting (same readings) HW2 to be assigned Tue Apr 13 Felsenstein Distances and distance matrix methods Ewens 499-511; Durbin 165-173, 189-191 WMA
MP3Thu Apr 15 Felsenstein Distance methods, Models of DNA change Ewens 475-496; Durbin 193-197 HW3 to be assigned WMA
MP3Tue Apr 20 Felsenstein Protein and codon models. Likelihood and Bayesian methods Ewens 512-516, 409-416; Durbin. 197-210, 215-217 WMA
MP3Tue Apr 22 Felsenstein HMMs for rates. Bootstraps and testing (provisional) Ewens 295-300, 308-309, 313-318, 522-535; Durbin 179-180, 212-215 HW4 to be assigned WMA
MP3
(owing to a
recording problem
only has the
first 11 minutes
of the lecture)
To find recordings of
equivalent material,
look at the recordings
of my Genome 570 course for 2010,
particularly the lectures
of week 6 and week 7Tue Apr 27 Felsenstein Coalescents Durbin 211-212 WMA
MP3Thu Apr 29 Ruzzo Modeling and Searching for Noncoding RNA see here HW5 to be assigned Tue May 4 Felsenstein Inference with coalescents (provisional) Ewens 392-398; Durbin 206-207, 211-212 WMA
MP3Thu May 6 Ruzzo Modeling and Searching for Noncoding RNA see here HW6 to be assigned Tue May 11 Ruzzo Modeling and Searching for Noncoding RNA see here Thu May 13 Bradley Structural bioinformatics see here HW7 to be assigned Tue May 18 Bradley Structural bioinformatics see here Thu May 20 Bradley Structural bioinformatics see here HW8 to be assigned Tue May 25 Borenstein Complex biological networks see here Thu May 27 Borenstein Complex biological networks see here HW9 to be assigned Tue Jun 1 Lee Bayesian networks and reconstructing
transcriptional regulatory networkssee here Thu Jun 3 Lee Bayesian networks and reconstructing
transcriptional regulatory networkssee here HW10 to be assigned (due June 10)