Contevol

[Contevol logo]

version 1.01

October, 2009

by Joe Felsenstein

Department of Genome Sciences
and Department of Bioloogy
University of Washington

© Copyright 2000, 2009 by the University of Washington. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.


A brief description

Contevol is a simulation program designed for my class in Evolutionary Genetics. It allows the user to simulate the evolution of a quantitative character that is controlled by 5 loci, where there is a fitness function designed by the user. The user gets to see the distribution of the values of the quantitative character in a population, generation after generation. The genotypes that make up this distribution (or at any rate a representative sample of them) are displayed in the same histogram.

Contevol is copyright to the University of Washington, 2000. Permission is granted to copy and use the program as long as its copyright notices are not removed. It can be used for free as long as it is not sold, or offered for sale as part of any product or service.

The genotypes and phenotypes

The population consists of N diploid individuals (where the default value of N is 100, but you can change this at run time). Each individual has 5 loci. Each locus has two alleles, one the capital letter (A, B, C, D, or E) and one the lower-case letter (a, b, c, d, or e). The phenotype is additively determined by these loci, and it is equal to the number of capital letters. There is no environmental effect on the phenotype (unlike real quantitative characters). Thus, for example, the diploid genotype AABBccDdee has phenotype 5 (the count of the number of capital letters in this genotype.

The loci are assumed to be unlinked (as if they were on different chromosomes). Thus each locus segregates independently into a gamete. Mating is at random, with only one sex and selfing allowed. This means that a mating is performed by choosing one parent from the population at random, then choosing the other parent at random with replacement. Thus it is possible (a small fraction of the time) to choose the same individual twice and have self-fertilization. (We could imagine ruling this out by requiring that the two parents be different individuals -- this would make only a small difference in the outcome of the simulation).

Each offspring is produced by choosing another pair of parents and mating them. The choices for different offspring are independent and with replacement. Thus it is possible for some individuals to have more than one offspring, and others to have none. This is a standard evolutionary genetics model called the Wright-Fisher model (as it was invented in 1930 and 1932 by the two great founders of that field, R. A. Fisher and Sewall Wright). In effect each offspring gets to choose their parents, independently.

Starting the program

To start the program, simply double-click on its icon (if you are on a Macintosh or Windows system), or type the program name contevol if you are on a Unix (or Linux) system. You will be presented with a small menu, which looks like this:

Contevol  (c) Copyright University of Washington 2009


Contevol settings Option Character What it is Its current value P Number of peaks in fitness curve 1 O Optimum character value 5.000000 S Strength of selection 10.000000 N Population size: 100 I Initial mean of character: 5.000000 L Number of lines on screen: 24 G Generations to next display: 1 Are these settings correct? ([Y], Q (quit), or option to change):

The menu shows you the current values (in this case the default values) of six of the parameters of the program. The column labelled "Option character" shows which character you should type to change the value of each parameter. For example, to change the population size, type "N". When you choose a parameter to change the program will prompt you for the new value. After that it will return you to the menu. When all the parameters have their desired values, you can accept them by typing the character Y. If you want to stop the run at this point, you can type Q.

The fitness function

When the parameters are accepted, the first thing the program does is to show you the fitness curve (the adaptive surface or fitness function). This shows the fitnesses of the different possible phenotypes, plotted against the phenotype. The plot is very crude because it is made using characters on a screen rather than by drawing curves in a graphics window. For example, if the fitness function had two optima, with optimum phenotypes 3 and 6, and strength of selection 1, the plot looks like this:

1.00 |                     O                 O                        
     |                    . .               . .                       
0.91 |                       .             .                          
     |                   .    .           .    .                      
0.82 |                  .      .         .      .                     
     |                          .       .                             
0.73 |                 .         O.....O         .                    
     |                .                           .                   
0.64 |                                                                
     |               O                             O                  
0.55 |                                                                
     |              .                               .                 
0.45 |             .                                 .                
     |                                                                
0.36 |            .                                   .               
     |                                                                
0.27 |           .                                     .              
     |          .                                       .             
0.18 |                                                                
     |         O                                         O            
0.09 |      ...                                           ...         
     |    ..                                                 ..       
     L---0-----1-----2-----3-----4-----5-----6-----7-----8-----9----10---
to continue, press Enter key, to stop press Q, to do menu again press M 

The heights of the letters "O" show (roughly) the fitnesses for each of the possible phenotypes, from 0 to 10. The dots interpolate between these roughly in straight lines (the values at those interpolated points do not matter, as all of the phenotypes always turn out to be whole numbers).

The fitness is determined by a curve which has one or two optimum values of the phenotype, with the number of optima and the optimum values determined by the program menu. Initially the menu shows one optimum with a value of 5.0. The user can either change that value, or can choose to change the number of optima from 1 to 2. In that case the O menu item is replaced by two menu items, 1 and 2. In the above example, two optima have been chosen and given the values 3.0 and 6.0. Although the phenotypes must (in our artifical example) be integers, the optima can be any real numbers. If they are too far outside the range of the phenotypes problems may occur from underflow of the fitnesses.

The fitness curve around an optimum falls away according to a Gaussian (or Normal) curve. This has the shape of the function

                e-x2

but it is shifted so that the peak occurs at the optimum, and compressed or spread out so that the curve falls away at the desired rate. The rate at which it declines is controlled by another parameter in the menu, the strength of selection. This value (S) is larger the weaker selection is, and smaller the stronger it is. As the selection curve has the form

                e-(x2)/S

fitness function will fall to e-1= 0.367879 of its peak value when x2 = S, which is when x equals the square root of S. Thus a fitness function which has S = 10 falls to 0.367 of its peak height when the phenotype is 3.162 units from the optimum, while a fitness function which has S = 1 is stronger selection, falling to 0.367 of its peak height when the phenotype differs from the optimum by 1.

When there are two optima, the fitness function is the sum of the fitness curves from the two optima. The heights of the curves are added. The fitness function shown in the example above shows this (look at the fitness in the region between the two optima where the curve does not fall away as quickly as it does on the outside of the optima). Note that it is the height (the fitness value) which is added, not the phenotype, which is the horizontal scale.

The fitness function is scaled so that the fitness of the best phenotype is 1. Whether or not this scaling is done actually has no effect on the evolution of the population, but it makes the curve easier to look at when plotted.

As you can see from the bottom line of the above screen image, when you are finished looking at the fitness curve, you will probably want to move on to the simulation itself, by pressing the Enter key. If you want to return to the menu, press M, and if you want to abort the run and stop the program at this point, press Q.

The initial population

One of the menu items (I) is the initial mean of the character. The initial population is set up as a sample of N individuals drawn from an infinitely large population which has this mean phenotype. Each locus in that infinite population has the same gene frequency, and there is no association ("linkage disequlibrium") between the alleles at different loci. This means that the program determines a gene frequency and draws each gene from a "gene pool" that has that frequency. For example. if the mean phenotype is set to 3, with 5 loci that means that an average of 3 of the 10 gene copies in an individual should be capital letters. Thus the desired gene frequency is 3/10 = 0.3. The program draws genes from a gene pool with gene frequency 0.3 at all five loci. Thus the initial population has, at random, a 0.3 chance for each letter in each individual's genotype that it will be a capital letter.

The simulation

The program runs a simulation of evolution of the population under random mating, and free recombination among the 5 unlinked loci. Each generation offspring are generated by randomly sampling (with replacement) two individuals from the previous generation to be parents. That offspring is produced by Mendelian reproduction with no linkage among the loci (as if they were on separate chromosomes). The offspring's phenotype is then determined by counting the capital letters in its genotype.

Selection then occurs by calculating the fitness of the offspring from its phenotype value, and using that fitness as the probability of retaining the offspring. Thus if the fitness is (say) 0.72, the program draws a random fraction between 0 and 1 and retains the individual if that random number turns out to be less than 0.72. This gives the offspring the desired 0.72 chance of survival. Offspring are produced, one after another, until N of them have survived. They constitute the next generation.

The menu parameter L allows you to change the number of lines of text that will appear on the screen. If you increase it, you will see a larger histogram, showing more completely what is happening.

The menu parameter G shows how many generations the program will simulate each before showing the user the result. The histogram of the population is displayed every G generations, starting with the G-th generation (thus if G=3 the user will see generations 3, 6, 9, 12, ...). To go forward G more generations you should press the Enter key. If you do not want to continue you can press S to start over with the same parameter values. To return to the menu and change the parameter values, press C. To quit the program, press Q.

The histogram

The histogram that is shown is a sample of the individuals in the population. Each individual is shown as a box of letters showing its genotype. Thus an individual may be represented as

           AbcDE
           ABcDe

These individual genotypes are placed in columns according to their phenotypes. Only a sample of the population can be shown, so that the columns are not too high to fit on the screen. At the bottom of the histogram the phenotype values are shown on the dashed axis, and below that are the numbers of individuals in the population that have that phenotype value. Thus you might see (say) a column of 8 individuals at phenotype value 6, but the number beneath the 6 on the axis shows that of the N individuals in the population, 36 actually have that phenotype value. (The individuals that are shown in the histogram are simply the first ones that were produced in that generation.)

Here is a typical histogram, to give you some idea what it looks like:

 |                                           ABCDE                  
 |_     _     _     _     _     _     _     _AbcdE_     _     _     
 |                                           ABCDE                  
 |_     _     _     _     _     _     _     _AbcdE_     _     _     
 |                                     ABcDE ABCDE                  
 |_     _     _     _     _     _     _abcDE_AbcdE_     _     _     
 |                                     ABcDE ABCDE ABCDE            
 |_     _     _     _     _     _     _abcDE_AbcdE_ABcDe_     _     
 |                                     ABcDE ABCDE ABCDE            
 |_     _     _     _     _     _     _abcDE_AbcdE_ABcDe_     _     
 |                                     ABcDE ABCDE ABCDE            
 |_     _     _     _     _     _     _abcDE_AbcdE_ABcDe_     _     
 |                                     ABcDE ABCDE ABCDE ABCDE      
 |_     _     _     _     _     _     _abcDE_AbcdE_ABcDe_ABCdE_     
 |                                     ABcDE ABCDE ABCDE ABCDE      
 |_     _     _     _     _     _     _abcDE_AbcdE_ABcDe_ABCdE_     
 |                               aBCDE ABcDE ABCDE ABCDE ABCDE      
 |_     _     _     _     _     _aBcde_abcDE_AbcdE_ABcDe_ABCdE_     
 |                         ABCDe aBCDE ABcDE ABCDE ABCDE ABCDE ABCDE
 |_     _     _     _     _abcde_aBcde_abcDE_AbcdE_ABcDe_ABCdE_ABCDE
 L---0-----1-----2-----3-----4-----5-----6-----7-----8-----9----10---
     0     0     0     0     3     7    24    30    21    13     2  
to continue, press Enter key, to stop Q, to run again S, to change case C  

Note the underscores which are used as "hash marks" to designate the bottom of genotypes. Note also that by running your eye up and down a column of genotyopes you can get some sense of which loci are at high or low frequencies. Thus if we have a column in which all the letters at the B locus are capital letters, that should be immediately apparent. (However, to actually get a sense of the gene frequency at the B locus in the population, you need to look at all the columns). When the population is fixed at all loci, with all gene frequencies 0 or 1, then histogram will show only one column. There is no point in continuing the run further at that point -- you will want to either Quit, or start over.

Suggestions for running the program

There are many cases you can explore when running Contevol. Some questions you might ask include:

  1. What is the effect of strength of selection? What happens in the long run if there is no selection (which you can approximate by taking S very large)? You may need to set the number of generations between histograms (G in the menu) large to do this.
  2. What is the effect of population size? If you have selection toward one end of the scale, is the long-term response to selection greater if the population size is greater?
  3. If you have an intermediate optimum value, what ultimately happens? Does the population continue to be variable at all loci? If not, what is the result?
  4. If there are two peaks at different locations, what happens to the population mean? Does it move to the same location as one of the peaks or does it move in between the two peaks?
  5. Suppose fitness rises as you move away from the “optimum” (which then isn't an optimum, its a pessimum)? Try moderately large but negative values of S, such as -15. Does the population always end up going to the same part of the phenotype scale? Does it go to two different parts of the scale in the same run? of

Getting Contevol

Contevol can be fetched from this web site:

http://evolution.gs.washington.edu/contevol

Here are some links that will help you get the particular files you need:

If you have a Windows computer you should get:

Note that some of these (perhaps the executable and the cygwin1.dll file), may show up as a browser window worth of "garbage". If you save that as a file called cygwin1.dll that should work.

If you have a Mac OS X system you should get

This puts a file on your system. Clicking on that it will create a "disk image" and mount that on your desktop. Open the disk image. It will show a window that has a folder named contevol-1.01. Copy that folder to some other location (do not try to use the copy that is still in the disk image).

The folder will have an executable which has an icon and is called contevol. It can be run by clicking on it. It will open a window which has a light blue background, and show the program menu. You then run the program by typing into that menu.

If you have a Linux system with an Intel-compatible processor such as a Pentium you should get

Executables for less-frequently-used operating systems

If you have a Macintosh Mac OS 8 or 9 (PowerMac) system you should get:

If you have a Compaq/Digital Alpha system running Compaq (Digital) Unix you should get

If you have another Unix or Linux system, you will want to compile the program yourself, which is not hard. You will need:

If you are intending to recompile the program using the CygWin Gnu C++ compiler on a Windows system (which you need do only if you want to modify the program, otherwise you can just use the Windows executables), you will need:

Running the executables

This is mostly straightforward. Here are instructions for running them for Windows, Mac OS X, and Unix (Linux). Note that in the program menu you can change the number of lines of text that appear in the plots.

Windows

Double-click on the executable (make sure that the file cygwin1.dll is in the same folder as the executable). An Command window will open which has the menu in it. It is rather small -- I suggest clicking on the side of the Auto box in the upper-left part of the window, and selecting a bigger font size such as 10x18.

Mac OS X

Double-click on the executable. A window will open in which the text of the menu and the plots will appear. It is not huge, but perhaps can be resized.

Unix (Linux)

Type the name of the executable such as contevol.linux. The menu and plots will appear in the current window.

Compiling it yourself

You shouldn't have to compile it yourself! We provide precompiled executables and those should be fetched and run unless you ave some need to modify the program. For those few people who do, we will describe how to compile the source code on CygWin Gnu C++ for Windows, on Xcode for Mac OS X systems, on Gnu C++ on Unix or Linux systems, or on Metrowerks C++ on Mac OS 8 or Mac OS 9 systems.

Compiling with CygWin Gnu C++

Cygnus Solutions has adapted the Gnu C++ compiler to Windows systems and provided an environment, CygWin, which mimics Unix for compiling. Once you have installed the free CygWin environment and the associated Gnu C compiler on your Windows system, compiling Contevol is essentially identical to what one does for Unix or Linux. We have provided a CygWin Makefile so you can do this (again, for normal use you should not have to recompile at all).

On entering the CygWin environment you will find yourself in one of the subdirectories of the CygWin folder. Change to the folder where the Contevol program contevol.c and the file Makefile.cyg You should then be able to compile contevol.c by issuing the command
make -f Makefile.cyg
The result should be a compiled executable called contevol.exe.

Compiling Contevol on Unix (or Linux) systems

If you have one of the kinds of Unix or Linux systems for which we do not distribute executables, or if you want to make changes in the source code, you can easily compile the source code. Make sure you are in a folder which contains our source file contevol.c and the file Makefile.unix  Then simply type the command:
make -f Makefile.unix
The result should be a compiled program called contevol.

Compiling in Mac OS X ...

If you want for some reason to compile an executable, follow these steps:

  1. Make sure you have the folder with the contevol.c program, the Makefile.osx file, and the mac folder in it.
  2. Open a Terminal window (the Terminal utility will be found in the Utilities folder inside the Applications folder on your system).
  3. Use the cd command to get to the folder that contains contevol.c
  4. Type the command: make -f Makefile.osx
The result should be that a Universal executable is produced with the Contevol icon. (There will be some lines of shell script commands typed to your window as the program compiles, but these can be ignored).

Compiling with Metrowerks Codewarrior ...

If you have Mac OS 8 or Mac OS 9, you should probably just use the PowerMac executable we supply, unless you have some reason to change the program yourself. These instructions are supplied for that rather rare case. We shall assume that you have a late version of the Metrowerks C++ compiler. This description, and the project files that we provide, assume Metrowerks 5.3. We also assume a reasonable familiarity with the use of the Codewarrior compiler and its Integrated Development Environment (IDE).

Start with a directory (folder) that contains the files contevol.c and Contevol.rsrc, both of which are provided by us.

Creating the project file. We have provided a project file contevol.proj. If you have it then you do not need to do the items on the following list. Skip down to the end of the list.

  1. Start up the Codewarrior IDE integrated development environment.
  2. Create a new project file by choosing New... on the File menu.
  3. Type in the project name contevol.proj
  4. On the Project menu on the left side of the New window, double-click on MacOS C/C++ Stationery
  5. In the New project window that opens, click on the triangle to the left of Standard Console.
  6. Move the slider at the right of the window down until you reach SIOUX-WASTE
  7. Click on the triangle to the left of SIOUX-WASTE. This opens another list of choices below.
  8. Click on the menu item SIOUX-WASTE C PPC. Press the OK button. After a bit a window contevol.proj will open.
  9. Click on the triangle to the left of the Sources menu item. A template item called HelloWorld.c will open.
  10. Select HelloWorld.c.
  11. Open the Edit menu at the top of the Mac screen and select Clear. A box will open asking if you want to remove HelloWorld.c from the project.
  12. Select OK.
  13. If the contevol.c file came from the self-extracting Macintosh archive that we distribute, it should show a yellow-and-back-striped Metrowerks icon (if not, as when you get it from some other form of our distribution, you may have to pass it through a program like Microsoft Word to get Metrowerks to be able to see it as a potential source code file).
  14. Drag the contevol.c file onto the Sources item in your contevol.proj window.
  15. Drop it onto Sources so that it appears under the Sources choice. This may take a few tries -- if it appears above Sources grab it and move it again.
  16. Drag contevol.rsrc into Sources in the same way. It doesn't matter whether it appears before or after contevol.c.
  17. Go to the Edit menu and select the PPC Std C SIOUX-WASTE Settings item. A window of that name will then open.
  18. Under the Target item you will see a PPC Target item. Select it. A PPC Target window will open to the right.
  19. Change the name in the File Name box to be Contevol
  20. Change the ???? in the Creator box to (say) CNTE
  21. Change the Preferred Heap Size to 1024.
  22. Under Language Settings in the left-hand menu of the window, select C/C++ Language. A window called C/C++ Language will open to the immediate right.
  23. Click on Require Function Prototypes to deselect that setting.
  24. Click on the Save button at the lower-right of the project settings window.
  25. Close the PPC Std C SIOUX-WASTE Settings window using the usual box in the upper-left corner.
  26. On your Desktop you should now find a folder Contevol. If it has a file called HelloWorld.c you may want to discard that file.
  27. In that Contevol folder you will find a file contevol.proj.
  28. Double-click on that project file. If the Metrowerks is not already open, it should open now.
  29. If a window called Project Messages opens and there is a complaint in it about access paths being wrong, you should fix these by selecting the Reset project entry paths item in the Project menu.
  30. Select the Make item in the Project menu.
  31. In the Project menu, select Make

If you do have the contevol.proj project file, you can just:

  1. Double-click on it to open the project window.
  2. In the Project menu click on Make.
The result should be a compiled PowerMac executable, with icon. Again, you do not have to recompile at all, unless you want to change something in the program, as we supply the already-compiled executable.

Future of the program

The program needs to be caught up to the capabilities of current operating systems. We hope some time in the future to make a version with better graphics, plotting histograms in color instead of relying on the user to see the capital letters and the lower-case letters in the genotypes.

Joe Felsenstein
Department of Genome Sciences
University of Washington
Box 355065
Seattle, Washington 98195-5065, U.S.A.

Electronic mail address: joe (at) removethispart.gs.washington.edu