SOFTWARE:

Software developed within my group is free and open source.  The code has been devleoped in C++, MatLab, Python and R, depepending on the preference of the programmer.

PG-BSM:  Phenotype-Genotype Branch-Site codon models (PG-BSM) was formulated to infer adaptive evolution without appealing to evidence of positive selection. The null model makes use of a covarion-like component to account for general heterotachy (i.e., random changes in the evolutionary rate at a site over time). The alternative model employs samples of the phenotypic evolutionary history to test for specific mechanisms of phennotype-genotype evolution.

Distribution: https://www.mathstat.dal.ca/~tsusko/software.html

Citation: Jones CT, Youssef N, Susko E, Bielawski JP. A Phenotype-Genotype Codon Model for Detecting Adaptive Evolution. Syst Biol. 2020 Jul 1;69(4):722-738. doi: 10.1093/sysbio/syz075. PMID: 31730199.


ModMLLikelihood ratio tests are commonly used to test for positive selection acting on proteins. We show that commonly used thresholds need not yield conservative tests, but instead give larger than expected Type I error rates. We introduce a modified LR test 
that restores statistica regularity using a modified likelihood ratio test (ModML).

Distribution: https://github.com/jehops/codeml_modl

Citation: Mingrone J, Susko E, Bielawski JP. ModL: exploring and restoring regularity when testing for positive selection. Bioinformatics. 2019 Aug 1;35(15):2545-2554. doi: 10.1093/bioinformatics/bty1019. PMID: 30541063.


Codeml-SBA:  To detect positive selection at individual amino acid sites, most methods use an empirical Bayes approach.  A difficulty with this approach is that parameter estimates with large errors can negatively impact Bayesian classification.  Bayes Empirical Bayes (BEB) mitigates this problem by imposing uniform priors, which causes it to be overly conservative in some cases. When standard regularity conditions are not met and parameter estimates are unstable, and even under BEB is negatively impacted.  We present an alternative to BEB called smoothed bootstrap aggregation (SBA), which bootstraps site patterns from an alignment of protein coding DNA sequences to accommodate the uncertainty in the parameter estimates. In in combination with kernel smoothing techniques, SBA improves site specific inference of positive selection.

Distribution: https://github.com/Jehops/codeml_sba

Citation: Mingrone J, Susko E, Bielawski J. Smoothed Bootstrap Aggregation for Assessing Selection Pressure at Amino Acid Sites. Mol Biol Evol. 2016 Nov;33(11):2976-2989. doi: 10.1093/molbev/msw160. Epub 2016 Aug 2. PMID: 27486222.


BioMiCo:  Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.  To adress this, we developed a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.

Distribution: https://sourceforge.net/projects/biomico/

Citation: Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh DA, Gu H, Bielawski JP. BioMiCo: a supervised Bayesian model for inference of microbial community structure. Microbiome. 2015 Mar 10;3:8. doi: 10.1186/s40168-015-0073-x. PMID: 25774293; PMCID: PMC4359585.


BiomeNet:  Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. To adress this, we developed a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks), for inferring differential prevalence of metabolic subnetworks among microbial communities. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems).

Distribution: https://sourceforge.net/projects/biomenet/

Citation: Shafiei M, Dunn KA, Chipman H, Gu H, Bielawski JP. BiomeNet: a Bayesian model for inference of metabolic divergence among microbial communities. PLoS Comput Biol. 2014 Nov 20;10(11):e1003918. doi: 10.1371/journal.pcbi.1003918. PMID: 25412107; PMCID: PMC4238953.


DendroCypher:  Modelling branch specific changes in the intesity of natural selction pressuer requires identifying and labelling specific branches within a phylogenetic tree.  This is very diffult to do when the tree is repreosnted in Newick forta and there are largen nuber of taxa (with many inetrnal branches).  DendroCytper is a tool for manipulating and labelling a bifurcating tree data structure. 

Distribution: https://bitbucket.org/EvoWorks/dendrocypher

Citation:
Bielawski, J.P., Baker, J.L. and Mingrone, J. 2016. Inference of episodic changes in natural selection acting on protein coding sequences via CODEML. Curr. Protoc. Bioinform. 54:6.15.1-6.15.32. doi: 10.1002/cpbi.2


Protein Stabilty simulation and analysis software:  Most proteins must fold into a native structure in which they are moderately stable before they are able to perform their biological function. Protein stability depends on the sequence of amino acids and their interactions in the folded three-dimensional structures. Because of these interactions, evolutionary selective constraints to maintain adequate stability result in epistatic dependencies between residues.  We developed and implemented mechanistic mutation-selection models in conjunction with a fitness framework derived from protein stability. We refer to these as the stability-informed site-dependent (S-SD) model and the stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein.  We also used the stability-constrained mechanistic mutation-selection models to show that nonadaptive evolution can lead to both positive (Stokes) and negative (anti-Stokes) shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation.

Distribution:
https://github.com/nooryoussef/antiStokes_shifts
https://github.com/nooryoussef/Consequences-of-stability-induced-epistasis

Citations:
Youssef N, Susko E, Roger AJ, Bielawski JP. Evolution of Amino Acid Propensities under Stability-Mediated Epistasis. Mol Biol Evol. 2022 Mar 2;39(3):msac030. doi: 10.1093/molbev/msac030. PMID: 35134997; PMCID: PMC8896634.

Youssef N, Susko E, Bielawski JP. Consequences of Stability-Induced Epistasis for Substitution Rates. Mol Biol Evol. 2020 Nov 1;37(11):3131-3148. doi: 10.1093/molbev/msaa151. PMID: 32897316.


LiBaCClassic likelihood ratio tests for positive selection have a high false-positive rate in situations when there is textansive and un-modelled varation in the evoltuaonry process among sites within a gene. We developed a new method for assigning codon sites into groups where each group has a different model, and the likelihood over all sites is maximized. The method, called likelihood-based clustering (LiBaC), can be viewed as a generalization of the family of model-based clustering approaches to models of codon evolution.

Distribution: available on request

Citation: Bao L, Gu H, Dunn KA, Bielawski JP. Likelihood-based clustering (LiBaC) for codon models, a method for grouping sites according to similarities in the underlying process of evolution. Mol Biol Evol. 2008 Sep;25(9):1995-2007. doi: 10.1093/molbev/msn145. Epub 2008 Jun 26. PMID: 18586695.


Codeml-FE: In some cases, a priori biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. Codeml-FE implements 11 new fixed effect models of codon evolution.

Distribution: available on request

Citation: Bao L, Gu H, Dunn KA, Bielawski JP. Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data. BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2148-7-S1-S5. PMID: 17288578; PMCID: PMC1796614.


(back to top)


TUTORIALS:

PAML "lab" in the Workshop on Molecular Evolution at MBLTutorial for the PAML lab exercises for the Workshop on Molecualr evolution at the Marine Bilogcail Laboratories (MBL) in Woods Hole, MA.

Distribution: https://bitbucket.org/EvoWorks/protocol-paml-lab-at-mbl-workshop/src/master/

Citations:

https://www.mbl.edu/education/courses/workshop-on-molecular-evolution/
https://molevolworkshop.github.io/faculty-bielawski/
https://awarnach.mathstat.dal.ca/~joeb/PAML_lab_old/lab.html

Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000 Dec 1;15(12):496-503. doi: 10.1016/s0169-5347(00)01994-7. PMID: 11114436; PMCID: PMC7134603.



"Advanced PAML lab"This tutorial adesses some advanced topics including (1) infernce of episodic seelction pressure, (2) detection of statistcial irregularities and unstable paramater estimates, (3) use of smoothed booststap aggregation to mitigate negative effects of unstable paramater estimates.
fAdvanced stundets will do this as the "PAML lab" exercises for the Workshop on Molecualr evolution at the Marine Bilogcail Laboratories (MBL) in Woods Hole, MA.


Distribution: https://bitbucket.org/EvoWorks/protocol-inference-of-episodic-selection/src/master/

Citations:

Bielawski, J.P., Baker, J.L. and Mingrone, J. 2016. Inference of episodic changes in natural selection acting on protein coding sequences via CODEML. Curr. Protoc. Bioinform. 54:6.15.1-6.15.32. doi: 10.1002/cpbi.2

Mingrone J, Susko E, Bielawski J. Smoothed Bootstrap Aggregation for Assessing Selection Pressure at Amino Acid Sites. Mol Biol Evol. 2016 Nov;33(11):2976-2989. doi: 10.1093/molbev/msw160. Epub 2016 Aug 2. PMID: 27486222.


Protocols: Detecting signatures of adaptive evolution: 
This is a more formal, protocol-based, tutorial on statitstical inference of adaptive evoluton using codon models.  The topics of the same as coved in the MBL "Paml Lab" but using differnt datsets and presented in an updated publication.

Distribution:  https://bitbucket.org/EvoWorks/protocol-detecting-signatures-of-adaptive-evolution/src/master/

Citations:  Bielawski JP. Detecting the signatures of adaptive evolution in protein-coding genes. Curr Protoc Mol Biol. 2013 Jan;Chapter 19:Unit 19.1.. doi: 10.1002/0471142727.mb1901s101. PMID: 23288462.


Bayesian Inference of Microbial Community Structure from Metagenomic Data Using BioMiCo:  This tutorial, and the associated book chapter, provides a set of protocols that illustrate the application of BioMiCo to real inference problems. Each protocol is designed around the analysis of a real dataset, which was carefully chosen to illustrate specific aspects of real data analysis. With these protocols, users of BioMiCo will be able to undertake basic research into the properties of complex microbial systems, as well as develop predictive models for applied microbiomics.

Distribution:  (upcomming)  The tutorial is currently availibe with electronic acess to the book

Citation:  Dunn KA, Andrews K, Bashwih RO, Bielawski JP. Bayesian Inference of Microbial Community Structure from Metagenomic Data Using BioMiCo. Methods Mol Biol. 2018;1849:267-289. doi: 10.1007/978-1-4939-8728-3_17. PMID: 30298260.


PG-BSM Tutorial: A tutorial based on a set of real data is included in the dsitrbution of the PG-BSM software. The software download will aoutomatically include all the required supplementalry files. Note that this tutotrial require MATLAB.

Distribution: https://www.mathstat.dal.ca/~tsusko/software.html

Citation: Jones CT, Youssef N, Susko E, Bielawski JP. A Phenotype-Genotype Codon Model for Detecting Adaptive Evolution. Syst Biol. 2020 Jul 1;69(4):722-738. doi: 10.1093/sysbio/syz075. PMID: 31730199.



(back to top)



CLUSTER COMPUTING:


Awarnach
Awarnach is the Bielawski group's computing cluster. It originally ran 64bit Solaris, but was updated in 2020 to run Linux.

Hardware and General Configuration
Awarnach has a Sun Fire X40z master node (two dual core Opteron 2.0GHz 870 CPUs and 16GB of RAM). Currently there there are 20 compute nodes, 19 with dual core Opteron 270 CPUs (2.2GHz), 4GB of RAM, and two fast, hardware mirrored 73GB disks and one “super node” with four 12-core 6348 CPUs and 256 GB of Ram. So, there are a total of 19*2*2 + 1*12*4 = 124 cores.

Storage space is NFS mounted from our storage server: an Asus RS300-E7-PS4 1U Server with an E3-1230V2 Xeon CPU, four Intel 60GB SSD (520 Series) and an LSI 9205-8e SAS controller connecting to a Supermicro SC847E16-RJB0D1 JBOD containing 10 WD30EFRX 2TB Hard Drives in a ZFS raidz3 pool.

Storage Space, NFS and AMD
Each user's home directory is located on the storage server and nfs-mounted from the nodes.

Job Scheduling
Job scheduling is handled with Grid Engine.

The Name Awarnach
The name Awarnach comes from the name of a giant in Arthurian legend.on their application to gene and genome data. BMC Evol


(back to top)