Some rants and diatribes by Joe
I always have a collection of rants and diatribes on various topics, and it
occurred to me that they should be more widely available. Currently, if you
are talking to me and mention topic #15, I come forth with rant #15 on that
topic. Why not put them online? I will be gradually adding rants here. Of
course some of them will turn out to be oversimplifications and caricatures,
and I will try to admit this if called out on these.
- Rant #1: Genomics terminology is often stupid. Examples:
- (This is fading from use, fortunately). People will point to a sequence
motif and call it a "box" as in "TATA box". Why "box"? Because when they
first saw one of these they drew a box around it. But suppose they had drawn
an elephant around it -- would we then call it a "TATA elephant"?
- "Nextgen sequencing"
- This is pure marketing terminology, adopted uncritically. But think: the
current generation of sequencing methods is "next generation
sequencing". What will happen in another generation? Will we say "back in the
days of next generation sequencing?". Some embarrassment is called for.
- Intron and exon
- See, an "intron" is a piece of DNA that is cut out
when making mature mRNA, and an "exon" is a piece of DNA that is left
in. Only if you run time backwards does this make any sense.
- Rant #2: The definition of "monophyletic" needs work. Right now the
standard definition is that a group is monophyletic if it consists of an
ancestor and all of its descendants. I see why that was done, but in some cases
it does not work. For example if we are discussing the set of species [Carrot,
Salmon, Gorilla, Human], and I claim that the set [Gorilla, Human] is
monophyletic, I'd like to be able to be right about that. But alas, that set
does not contain the common ancestor of humans and gorillas, nor does it
contain all descendants, the fossil ones and present-day ones such as the two
My own definition of monophyletic is that a set of species is monophyletic if
it has its own common ancestor, which is not the ancestor of any of the
other species under discussion. That seems to work for all cases.
- Rant #3: The term "segmental duplication" can be misleading.
When I first heard it, I thought it must refer to the duplication of some new
unit of the genome, the "segment". But I wasn't sure what that unit was.
Subsequently I realized that it simply meant duplications of parts of the
genome that were not defined as gene duplications. I wonder whether the
word "segmental" is really helpful.
- Rant #4: Inference of phylogenies is not simply a matter of
nested derived states. In many textbooks, museum displays, and blog
arguments it is confidently asserted that the way we reconstruct phylogenies is
to find shared derived states (synapomorphies), and that the nested pattern
of these defines clades. The problem is that, if this were true, we would
have no parallelisms, no convergences, no reversals in evolution. We would
also always know where the tree was rooted, since we would always know which
state was the ancestral state. We would not
ever need computer programs to reconstruct phylogenies. Not even parsimony
methods would be necessary -- you could always just work by hand and you would
then always find a tree with no extra steps. Later in the same textbooks,
there are sections on phylogeny methods that talk about the need for computer
programs to reconcile conflict among characters. But there is not a word
about how this conflicts with the statements earlier in the book.
- Rant #5: The Modern Synthesis has not been replaced. Sure, all
sorts of new phenomena have come along since the 1940s: neutral mutation,
lateral gene transfer, symbiosis, evo-devo, epigenetics, etc. And we could
declare the death of the current Synthesis each time one came along. But here's
why we shouldn't do that:
It would be (temporarily) great for Blotz's and Schmerz's careers and egos, but a disaster
for everyone else.
- Otherwise every time John Blotz pointed out a new phenomenon he
could strut around publicizing the fact that he, the great Blotz, had
invalidated the evolutionary synthesis, and now we had (ta-da!) the Blotzian
Synthesis. But he would be shocked a year or two later when Jane Schmerz came
along and invalidated the Blotzian Synthesis in favor of the new Schmerzian
Synthesis. And so it would go, synthesis after synthesis, until everyone was totally confused, and most
people were several syntheses behind.
- Meanwhile the public would be continually told that all that stuff they
learned in secondary school, about mutation and natural selection and some
other evolutionary forces, was all wrong, because now we had the Blotzian (er,
oops, actually the Schmerzian) Synthesis instead.
- Rant #6: The term "transitional fossil" is misleading terminology.
Creationists often say that we have no "transitional fossils". Biologists
reply that we have lots of them. The argument is partly over what a
transitional fossil is supposed to be. It sounds like it is a fossil from an
ancestral species, caught in the act of "transition" to a new form.
I think "transitional" is actually bad terminology. I think it dates to 50+
years ago when many evolutionary biologists naïvely assumed that any fossil
that looked like an ancestor really was the ancestor. Archaeopteryx was assumed
to actually be an ancestor of modern birds.
Now the definition is modified to mean having a "transitional" combination of
character-states, which is what our fossils really are. We have lots of those.
But we're plagued by
the word "transitional". We need some term that does not also imply, to the
unwary listener, that the fossil is known-to-be-the-ancestor.
- Rant #7. There is no consensus, even among systematists, as to what the
word "cladistics" means.
There seem to be various
- Cladistics is the position that all groups in the classification system be
- Cladistics is that, plus reconstructing the tree by nested synapomorphies.
- Cladistics is those, plus using parsimony methods when you have reversals or
parallelisms in your data.
- Cladistics is all those, but only if you're a paid-up member of the Willi
- Cladistics is numbers 1-3, plus using likelihood or Bayesian methods to infer
- Cladistics is numbers 1-3 and number 5, plus inferring trees by distance matrix
The straightforward and coherent definition seems, to
me, to be #1. It is a position on classification, not on how the phylogeny
should be inferred. As such it is a sensible position, and certainly the
dominant one these days. The other definitions are, however,
talking about two things at
once -- how we infer phylogenies and how we define groups in the classification
system. You can find in the literature, including in textbooks, all of the
other positions. The definition of "cladistics" is a disastrous mess. Systematists are wildly divided among these
various positions. The one thing that unites all of them is the belief that
the definition of cladistics is clear and widely agreed-upon. They are each
sure that their definition is the one everyone else agrees with.
A glance at the Wikipedia page for "cladistics" will find
it advocating a position somewhere between #5 and #6. In the Talk:Cladistics
Wikipedia page (especially here) people,
including me, complain about this. The failure of agreement simply reflects the state of systematics and the wildly mixed
messages that it gives to the outside world.
- Rant #8. Can we please stop using the term "incomplete lineage sorting"?
Because it is much less clear than just thinking about coalescence within each species and then, in their
common ancestor, the remaining lineages coming to be in the same population and then coalescing as one goes
Consider thinking of it as ILS. Suppose we have three samples from species A and four from species B. Then we
start at some point below their common ancestor (actually, where?) and as we go up to the common ancestor, the
lineages split (actually, how many times?). Then the resulting lineages go up into the two species (actually, how
do they decide how many go each way?). They can't do it independently, because then all of them might go into
species B, in which case there are none to go into species A. Once they get into those two species, they can
split further, but have to end up with exactly the right sample sizes.
This is horribly complex. Turning around and starting with the samples and going back in time, it is just
two coalescents, combination of the remaining lineages, and a further coalescent. That is much easier to
think about, and to simulate. So let's think of this as the multispecies coalescent, not as the difficult
concept of "incomplete lineage sorting". It was important in the early history of thinking about the
multispecies coalescent, but it has been superseded by a much clearer conceptual framework.
More to come, soon.