Subject: Phylogenetics 1) Reading: a) Wikipedia - https://en.wikipedia.org/wiki/Computational_phylogenetics - https://en.wikipedia.org/wiki/Phylogenetic_tree - https://en.wikipedia.org/wiki/Tree_of_life_(biology) - https://en.wikipedia.org/wiki/MEGA,_Molecular_Evolutionary_Genetics_Analysis (HW: install MEGA and use for some data; remember that this is sort of a GUI blackbox and many people consider it as not a valid method of doing phylogenetics, but ...) b) Chapters 6-9 (87-178) from https://www.mimuw.edu.pl/~lukaskoz/teaching/sad2/books/Molecular-Evolution-and-Phylogenetics.pdf c) Chapter 4 (pp. 83-145) from https://www.mimuw.edu.pl/~lukaskoz/teaching/sad2/books/Dirk_Husmeier.pdf d) Chapter 7 (pp. 160-190) from https://www.mimuw.edu.pl/~lukaskoz/teaching/sad2/books/Durbin.pdf f) finally, the lecture: 05_Statistical_phylogenetics.pdf ========================================================================== 2) Exercise: During today's sesion we will cover methods for tree buliding. a) First we need to obtain some data: Go to NCBI (https://www.ncbi.nlm.nih.gov) and from "Nucleotide" database extract manually the complete mitochondrial DNA for some primates (Homo sapiens, Gorilla gorilla, Pan troglodytes, Pan paniscus, Pongo pygmaeus, Pongo pygmaeus abelii, Hylobates lar). Hint: e.g. "Gorilla gorilla mitochondrial DNA, complete sequence" and then store in the fasta 16,364 bp circular DNA. To limit query use "Mitochondrion" filter. You are interested in sequences which have ~16kbp (do not use fragments). Store all seuences in one *.fasta file. Re-name fasta headers into short ones e.g: ">gi|1632801|emb|X99256.1|HLMITCSEQ Hylobates lar complete mitochondrial DNA sequence" -> ">Hlar" b) Read the fasta file(s) into R (use package such as msa, ape, seqinr, Biostrings, phytools, phangorn) c) Do MSA using ClustalW method in R! Browse the alignment and write it to pdf/html and MSA format file (e.g. CLUSTAL, NEXUS, PHYLIP, PIR, GDE, MSF, fasta). fas = 'primates2.fasta' msa_fsa <- read.phyDat(fas, format="fasta", type = "DNA") msa_dna <-read.dna(fas, format="fasta", as.character = FALSE) align_primates<-phyDat(msa_dna, type = "DNA") d) Construct distanse matrix (phangorn: explore different distanse matrix e.g. hamming, ml, etc.) e) Do trees using: - NJ (https://en.wikipedia.org/wiki/Neighbor_joining) - UPGMA (https://en.wikipedia.org/wiki/UPGMA) - parsimony (https://en.wikipedia.org/wiki/Maximum_parsimony_(phylogenetics)) f) Construct initial model using 'pml' with NJ tree, the data and K80 model For details about K80 see: http://www.bioinf.man.ac.uk/resources/phase/manual/node67.html Store the trees in different formats e.g. (e.g. Newick, NEXUS, PHYLIP). g) Optimize with respect to branch lengths (optim.pml) h) Optimize with respect to nucleotide substitution model parameters i) Optimize simultaneously with respect to branch lengths, nucleotide Writes trees g-h to pdf file j) bootstrap the trees with ape using phylo.boot or/and with phangorn using bootstrap.pml Check support for individual branches. k) try to re-root the tree on arbitrary choosen nodes. Based on trees can you say which organism is most distant from others and which one is the closest relative of the human? If times allows: a) go to page 156 (168 in pdf) https://en.wikipedia.org/wiki/MEGA,_Molecular_Evolutionary_Genetics_Analysis Try to reproduce the ML trees from Fig 8.3 ========================================================================== 3) Additional material: R specific: https://www.mimuw.edu.pl/~lukaskoz/teaching/sad2/books/Analysis_of_Phylogenetics_and_Evolution_with_R.pdf https://cran.r-project.org/web/packages/phangorn/phangorn.pdf https://cran.r-project.org/web/packages/ape/ape.pdf