The papers that started it all

Goldman, N., and Z. Yang. 1994. A codon based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736.

Muse, S. V. and B. S. Gaut. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with applications to the chloroplast genome. Mol. Biol. Evol. 11:715-725.


NOTE: I am no longer updating this list. 

There are just too many references to keep up with.

Models

Models for variable selection pressure among sites:

Bao L, Gu H, Dunn KA, Bielawski JP. 2008. Likelihood Based Clustering (LiBaC) for Codon Models, a method for grouping sites according to similarities in the underlying process of evolution. Mol Biol Evol. Jun 26. [Epub ahead of print]

Mayrose I, Doron-Faigenboim A, Bacharach E, Pupko T. 2007. Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates. Bioinformatics. 23(13):i319-27.

Bao L, Gu H, Dunn KA, Bielawski JP. 2007. Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evol Biol. 7 Suppl 1:S5.

Wilson DJ, McVean G. 2006. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics. 172(3):1411-1125.

Huelsenbeck JP, Jain S, Frost SW, Pond SL. 2006. A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc Natl Acad Sci U S A. 103(16):6263-6268.

Pond SK, Muse SV. 2005. Site-to-site variation of synonymous substitution rates. Mol Biol Evol. 22(12):2375-85.

Massingham T, Goldman N. 2005. Detecting amino acid sites under positive selection and purifying selection. Genetics. 169(3):1753-1762.

Huelsenbeck JP, Dyer KA. 2004. Bayesian estimation of positively selected sites. J Mol Evol. 58:661-672.

Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49-57.

Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.

Nielsen, R. and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.

 

Models for variation in selection pressure among lineages:

Kosakovsky Pond SL, Frost SD. 2005. A genetic algorithm approach to detecting lineage-specific variation in selection pressure. Mol Biol Evol. 22(3):478-485.

Seo TK, Kishino H, Thorne JL. 2004. Estimating absolute rates of synonymous and nonsynonymous nucleotide substitution in order to characterize natural selection and date species divergences. Mol Biol Evol. 21(7):1201-1213.

Bielawski, J. P. and Z. Yang. 2003. Maximum likelihood methods for detecting adaptive evolution after gene duplication. Journal of Structural and Functional Genomics, 3:201-212.

Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.

 

Models for variable selection pressure among sites & lineages:

Zhang J, Nielsen R, Yang Z. 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 22(12):2472-2479.

Bielawski, J. P. and Z. Yang. 2004. A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. Journal of Molecular Evolution, 59:121-132.

Guindon S, Rodrigo AG, Dyer KA, Huelsenbeck JP. 2004. Modeling the site-specific variation of selection patterns along lineages. Proc Natl Acad Sci U S A. 101:12957-12962.

Forsberg R, Christiansen FB. 2003. A codon-based model of host-specific selection in parasites, with an application to the influenza A virus. Mol Biol Evol. 20(8):1252-1259.

Yang Z, Nielsen R. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 19:908-917.

 

Yet more models:

Yang Z, Nielsen R. 2008. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol. 25(3):568-579.

Kosakovsky Pond SL, Poon AF, Leigh Brown AJ, Frost SD. 2008. A Maximum Likelihood Method for Detecting Directional Evolution in Protein Sequences and its Application to Influenza A Virus. Mol Biol Evol. 2008 May 29. [Epub ahead of print]

Seoighe C, Ketwaroo F, Pillay V, Scheffler K, Wood N, Duffet R, Zvelebil M, Martinson N, McIntyre J, Morris L, Hide W. 2007. A model of directional selection applied to the evolution of drug resistance in HIV-1. Mol Biol Evol. 24(4):1025-1031.

Doron-Faigenboim A, Pupko T. 2008. A combined empirical and mechanistic codon model. Mol Biol Evol. 24(2):388-397.

Kosiol C, Holmes I, Goldman N. 2007. An empirical codon model for protein sequence evolution. Mol Biol Evol. 24(7):1464-1479.

Wong WS, Sainudiin R, Nielsen R. 2006. Identification of physicochemical selective pressure on protein encoding nucleotide sequences. BMC Bioinformatics. 7:148.

Sainudiin R, Wong WS, Yogeeswaran K, Nasrallah JB, Yang Z, Nielsen R. 2005. Detecting site-specific physicochemical selective pressures: applications to the Class I HLA of the human major histocompatibility complex and the SRK of the plant sporophytic self-incompatibility system. J Mol Evol. 60(3):315-26.

Schneider A, Cannarozzi GM, Gonnet GH. 2005. Empirical codon substitution matrix. BMC Bioinformatics. 6:134.

Yang, Z., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Molecular Biology and Evolution 15:1600-1611.

 

The problem of rate estimation and comparison of rates

Yang Z, Nielsen R. 2008. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol. 25(3):568-579.

Chapter 2 in Yang, Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford, England. [Book web site]

Aris-Brosou S, Bielawski JP. 2006. Large-scale analyses of synonymous substitution rates can be sensitive to assumptions about the process of mutation. Gene. 2006 Aug 15;378:58-64.

Bierne N, Eyre-Walker A. 2003. The problem of counting sites in the estimation of the synonymous and nonsynonymous substitution rates: implications for the correlation between the synonymous substitution rate and codon usage bias. Genetics. 165:1587-1597.

Bielawski, J. P., K. A. Dunn, and Z. Yang. 2000. Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions. Genetics. 156:1299-1308.

Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17: 32-43.

Muse, S. V. 1996. Estimating synonymous and non-synonymous substitution rates. Mol. Biol. Evol. 13:105-114.

Statistical tests and the identification of positively selected sites

Bao L, Gu H, Dunn KA, Bielawski JP. 2008. Likelihood Based Clustering (LiBaC) for Codon Models, a method for grouping sites according to similarities in the underlying process of evolution. Mol Biol Evol. Jun 26. [Epub ahead of print]

Anisimova M, Yang Z. 2007. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 24(5):1219-1228.

Bao L, Gu H, Dunn KA, Bielawski JP. 2007. Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evol Biol. 7 Suppl 1:S5.

Aris-Brosou S. Identifying sites under positive selection with uncertain parameter estimates. Genome. 9(7):767-776.

Scheffler K, Martin DP, Seoighe C. 2006. Robust inference of positive selection from recombining coding sequences. Bioinformatics. 22(20):2493-2499.

Yang Z. 2006. On the varied pattern of evolution of 2 fungal genomes: a critique of Hughes and Friedman. Mol Biol Evol. 23(12):2279-2282.

Scheffler K, Seoighe C. 2005. A Bayesian model comparison approach to inferring positive selection. Mol Biol Evol. 22(12):2531-2540.

Kosakovsky Pond SL and Frost SDW. 2005. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 22:1208-1222.

Yang Z, Wong WS, Nielsen R. 2005. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 22(4):1107-1118.

Wong WS, Yang Z, Goldman N, Nielsen R. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 168:1041-1051.

Suzuki Y. 2004. New methods for detecting positive selection at single amino acid sites. J Mol Evol. 59(1):11-19.

Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 164:1229-1236.

Shriner D, Nickle DC, Jensen MA, Mullins JI. 2003. Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet Res. 81(2):115-121.

Anisimova, M., J. P. Bielawski, and Z. Yang. 2002. Accuracy and Power of Bayes prediction of amino acid sites under positive selection. Molecular Biology and Evolution, 19:950-958.

Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy and power of likelihood ratio test to detect adaptive molecular evolution. Molecular Biology and Evolution. 18(8):1585-1592.

Codon models in phylogeny reconstruction

Seo TK, Kishino H. 2008. Synonymous substitutions substantially improve evolutionary inference from highly diverged proteins. Syst Biol. 57(3):367-377.

Inagaki Y, Roger AJ. 2006. Phylogenetic estimation under codon models can be biased by codon usage heterogeneity. Mol Phylogenet Evol. 40(2):428-434.

Shapiro B, Rambaut A, Drummond AJ. 2006. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol. 23(1):7-9.

Ren, F., H. Tanaka, and Z. Yang. 2005. An empirical examination of the utility of codon-substitution models in phylogeny reconstruction. Syst. Biol. 54: 808-818.

Reviews and commentaries

Kosakovsky Pond SL, Poon AF, Zárate S, Smith DM, Little SJ, Pillai SK, Ellis RJ, Wong JK, Leigh Brown AJ, Richman DD, Frost SD. 2008. Estimating selection pressures on HIV-1 using phylogenetic likelihood models. Stat Med. 2008 Apr 1. [Epub ahead of print]

Anisimova M, Liberles DA. 2007. The quest for natural selection in the age of comparative genomics. Heredity. 99(6):567-579.

Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG. 2007. Recent and ongoing selection in the human genome. Nat Rev Genet. 8(11):857-868.

Bielawski, J. P., and Z. Yang. 2005. Maximum likelihood methods for detecting adaptive protein evolution, in (R. Nielsen ed.) Statistical Methods in Molecular Evolution, Springer-Verlag, New York. [Book at Springer]

Yang, Z. 2005. The power of phylogenetic comparison in revealing protein function. PNAS 102:3179-3180.

Yang, Z. 2002 Inference of selection from multiple species alignments. Current Opinion in Genetics and Development 12: 688-694.

Yang, Z. and J. P. Bielawski. 2000. Statistical tests of adaptive molecular evolution. Trends in Ecology and Evolution, 15:496-502.

Other papers cited in the lecture or in the lab

Anisimova M, Bielawski J, Dunn K, Yang Z. 2007. Phylogenomic analysis of natural selection pressure in Streptococcus genomes. BMC Evol Biol. 7:154.

Aguileta G, Bielawski JP, Yang Z. 2004. Gene conversion and functional divergence in the beta-globin gene family. J. Mol. Evol. 59:177-189.

Bielawski JP, Dunn KA, Sabehi G, Beja O. 2004. Darwinian adaptation of proteorhodopsin to different light intensities in the marine environment. Proc Natl Acad Sci U S A. 101:14824-14829.

Yang, W., J. P. Bielawski, and Z. Yang. 2003. Widespread adaptive evolution in the human immunodeficiency virus type-1 genome. J. Mol. Evol., 57:212-221.

Nielsen R, Yang Z. 2003. Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. Mol Biol Evol. 20:1231-1239.

Schadt EE, Sinsheimer JS, Lange K. 2002. Applications of codon and rate variation models in molecular phylogeny. Mol. Biol. Evol.19(9):1550-1562.

Schadt E, Lange K. 2002. Codon and rate variation models in molecular phylogeny. Mol. Biol. Evol. 19(9):1534-49.

Bielawski, J. P. and Z. Yang. 2001. The role of selection in the evolution of the DAZ gene family. Mol. Biol. Evol. 18: 523-529.

Dunn, K. D., J. P. Bielawski, and Z. Yang. 2001. Rates and patterns of synonymous substitutions in Drosophila: implications for translational selection. Genetics. 157:295-305.