SISG module 19 (2016) programs

(Where is the main 2016 SISG module 16 web page? It is in the folder above this one, namely here).

PAUP*, some materials for PAUP* labs, including data sets, and some MrBayes lab material

Mark has made available a link for the version of PAUP* that we will use, as well as materials for the PAUP* labs, and data sets for them. There are also links there for other programs: MrBayes, Figtree, and Tracer. A web page for them at this link:

http://phylo.bio.ku.edu/slides/sisg2016/index.html Nearby (here) are Nexus versions of "the data sets Joe collected for the PHYLIP lab exercises.

The BEAST program

We will have a demonstration on Friday of the program BEAST. The main website for BEAST, from which it can be downloaded is here:

http://beast2.org

PHYLIP programs

For the PHYLIP exercises on Monday and Tuesday, you should have PHYLIP installed on your laptop (it will not work on a tablet). The current distributed release is version 3.695, but here we will try out some programs from version 4.0, which is not yet released.

They will be alternatives to downloading version 3.695. You can use either version for the PHYLIP lab, but try 4.0 first. Below you will find links that allow you to download a reduced subset of programs from PHYLIP 4.0, sufficient for the exercise. They allow you to run these programs in two ways: using a Java front-end interface, and using a character-mode menu. Both result in the same computations. You can use either. If you have 64-bit Windows as your operating system, you may not be able to use the Java interface, as we have not yet been able to construct the proper 64-bit dynamic load libraries.


Downloading and installing the programs

Operating systemDownloadInstalling it
32-bit Windows Click here to download the Zip archive On downloading, this archive may be extracted. If not, right-click on it and select the option to extract the archive. Make sure to put all the contents in a folder whose name you will recall (such as phylip).
Mac OS X Click here to download a .dmg archive Then click on the archive icon to open the Disk Image "disk". Click on that to open its folder. Inside you will find a folder named "phylip-4.0sisg". Do not use that there, but copy it to some other place on your disk where you will be able to find it.
Linux Click here to download a .tar.gz archive
  1. Move this to an appropriate place in your folders.
  2. Unpack it by opening a Terminal window and using the cd command to move there, then type the command
    tar -zxvf phylip-4.0sisg.tar.gz
    That should give you a folder phylip-4.0sisg. Within it, the executable programs and their Java interface code will be in folder programs.
  3. If you have a 64-bit Intel-compatible processor in your computer, you should be able to use those executables. If not, see Joe. You could always download and compile the 3.695 version.


Getting the Java front ends of the programs to run on your computer

For PHYLIP we are making available a pre-alpha pre-release copy of 9 of the PHYLIP programs from PHYLIP 4.0. These use a Java front end which may or may not require that you install a recent version of Oracle Java.

Windows:
For Windows machines Oracle Java can be installed rather easily with a free download from the main Java web site. There is a link to download and install Oracle Java on the SISG Computing and Software web site, in the links for our module, to that web site. However if you have a 64-bit Windows machine our Java front end machinery will probably not work, as we have not yet successfully prepared the proper 64-bit dynamic load libraries. In that case you will be better off running the PHYLIP programs directly, and use their character-mode menus.
Mac OS X:
For Mac OS X machines the version of Java that is provided with the Mac OS X operating system should be good enough, so you do not have to install Java yourself.
Linux:
If you have a 64-bit Linux machine, the Java implementation that comes with it may be good enough. Try it. If not, with some difficulty it is possible to download a recent Oracle Java to an Ubuntu Linux system, and configure the paths properly to use it. 3.695. Below see more on how to set up Oracle Java on Linux.

Before you run any of the Java front end programs, change the PATH variable by issuing the following command, which only has to be done once per session (not once per program):

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

Note the colon, followed by a dot, at the end of that command. That is important. After that the Java programs should be able to find the proper dynamic load libraries.

The command for running the Java program that presents the menu for a program, say Dnaml, is

java -jar DnamlJava.jar


Running the PHYLIP programs

The PHYLIP programs come with two ways of running them. One is run the programs directly by typing the name of the program (such as "dnadist" or "./dnadist"). That will run the character-menu version of the program. It may also be possible to run these programs directly by clicking on their icons. The other way of running the programs is by running the Java interface version. For that you click on the Java program such as "DnaDist.jar" or issue the command "java -jar DnaDist.jar". Here are some specific instructions for your operating system:

Windows

If you have a 32-bit Windows system, you may be able to run the Java front-ends for PHYLIP. Download the Gzip archive and install it in its own folder. Extract it with the command tar -zxvf phylip-4.0sisg.tar.gz (or you may be able to extract it by right-clicking on the Gzip archive's icon). Afterward you will find a folder called phylip-4.0sisg. Go into that folder, where you will find the PHYLIP programs and Java code for the front ends.

Mac OS X

The main problem that will occur is that if you try to click on a program icon to run it, the operating system will warn you that this program comes from an unknown developer, and that something horrible may happen. That is just because we have not yet been able to sign the DMG file with my Apple Developer identity. To get around this, the first time you run each program it may be necessary to control-click on it instead, then choose "Open" from the menu that pops up. After that the program will run and you won't have to that again for this program but can just click on its icon.

Linux

Things will work well on any 64-bit Linux system. But for each session you do need to set the LD_LIBRARY_PATH variable properly, using this command:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

where the colon and dot at the end are important. After that everything will work. Until the next session when you have to do it again.

For all of these

When you use the Java menu, and are navigating to find an input file, when it presents you with a file folder name, if you want to choose it by clicking on it, that may not cause anything to happen. If it does not move into that folder, choose it and then use the Execute button in the bottom corner of the Java menu, which will open that folder. Once you have chosen the correct input file, use the Execute button again to run the PHYLIP program.


Setting up Oracle Java on Linux (if you have to)

Here are more details of how to set up Oracle Java on Linux (if you need to).

The script that sets up the Linux environment and start the Java interface is named ...Java.unx. Because nobody without root privileges can install anything on a Linux machine where any other user can see it, we had to write the Java execution scripts so that it looked for Java locally in the user's directory. So for each Java script (DrawGramJava.unx for example) there is an execution script which set up the environment that reads:

export PATH=~/jdk1.7.0_11/bin/:$PATH
export JAVA_HOME=~/jdk1.7.0_11/bin/
export LD_LIBRARY_PATH=.
java -jar ./DrawGram.jar
The user must replace the ~/jdk1.7.0_11 with the path to Oracle Java on their machine. Then they double click on the ...Java.unx file and the Java interface will run.


R programs

Our intention was to have you also briefly familiarize yourself with one of the phylogeny programs are available in the R language. The two we wanted you to try were phangorn, but Klaus Schliep, and Rphylip, by Liam Revell. Unfortunately ...

  1. phangorn is available in version 2 of R and can easily be loaded, but ...
  2. It seems to fail to install in version 3 of R, while ...
  3. Rphylip works only with version 3 of R.
All we can say is aRrgh !!!! Depending on which R version you have, try one.

phangorn

phangorn by Klaus Schliep also of another R phylogeny package ape by Emanuel Paradis. ape should automatically be loaded when you load phangorn.

Fortunately, phangorn is available with the default install of R version 2. You simply need, once inside R, to type the command

library(phangorn)

It and ape, which it requires, are then loaded.

I your R version does not find phangorn,

To see whether phangorn will install successfully on your computer, try the command

install.packages("phangorn")

This will require you to choose a download mirror site, and wait awhile for installation to happen. Once installed, you use the library command. To get more information on phangorn, use the command

library(phangorn = help)

You can also get information on individual commands by preceding their names with question-marks, such as these

?phangorn

?parsimony

?read.phyDat

For a good quick introduction to using phangorn to infer phylogenies, I recommend the very brief and straightforward "vignette" on Trees which is available here:

https://cran.r-project.org/web/packages/phangorn/vignettes/Trees.pdf

When you read a data file into phangorn using the read.phyDat function, it should either be in the current folder, or you should give a path to it when you give the filename.

If you are puzzled by the need for a tree as an argument to the parsimony command, note that in the "vignette" Schliep gets such a tree by doing a distance matrix run with the commands getting a distance matrix, then running the upgma command and using its result as the initial tree in running parsimony..

You might want to compare the behavior of PAUP*, PHYLIP, and phangorn on the same data set.

Rphylip

Rphylip can be loaded if you have a recent enough version of R, at least version 3. You need to use a special installation command like

install.packages("Rphylip",repos="http://cran.cnr.berkeley.edu")

This may take some time -- be patient and wait for the loading to complete. After that do the command

library(Rphylip)

Rphylip is an R front end for PHYLIP, with some "benefits". It should be loaded into R in a folder that contains the PHYLIP programs that are to be called. Documentation on Rphylip starts with the PDF produced by Liam Revell (see it here). A help page can also be found using the command

library(help = Rphylip)

A warning

I have been trying to do runs with these two R packages. A big problem has been reading in data from PHYLIP-format data files of molecular sequences. Rphylip wants you to load these into R using a command from the ape package or a command from Rphylip. The former is needed for DNA sequences, the latter for protein sequences. So you also need to load ape using the command

library(ape)

Then to load a DNA sequence dataset foobar.dna in PHYLIP's Interleaved format, and call the resulting dataset x you need to know where the dataset is (have the path to it, or have it in the current folder) and then issue the command

x <- read.dna("foobar.dna", format = "interleaved")

This seems to work on the file primates.dna. It does not work on some other Interleaved format files, or on Sequential format files. aRrgh!