Software developed within my group is free and
open source. The code has been devleoped in C++,
MatLab, Python and R, depepending on the preference of the
programmer.
PG-BSM:
Phenotype-Genotype Branch-Site codon models (PG-BSM)
was formulated to infer adaptive evolution without appealing
to evidence of positive selection. The null model makes
use of a covarion-like component to account for general
heterotachy (i.e., random changes in the evolutionary rate
at a site over time). The alternative model employs
samples of the phenotypic evolutionary history to test for
specific mechanisms of phennotype-genotype evolution.
Distribution: https://www.mathstat.dal.ca/~tsusko/software.html
Citation: Jones CT, Youssef N, Susko E, Bielawski JP. A
Phenotype-Genotype Codon Model for Detecting Adaptive
Evolution. Syst Biol. 2020 Jul 1;69(4):722-738. doi:
10.1093/sysbio/syz075. PMID: 31730199.
ModML: Likelihood
ratio tests are commonly used to test for positive
selection acting on proteins. We show that commonly used
thresholds need not yield conservative tests, but
instead give larger than expected Type I error rates. We
introduce a modified LR test
that restores statistica regularity using a
modified likelihood ratio test (ModML).
Distribution: https://github.com/jehops/codeml_modl
Citation: Mingrone J, Susko E, Bielawski JP. ModL: exploring
and restoring regularity when testing for positive
selection. Bioinformatics. 2019 Aug 1;35(15):2545-2554. doi:
10.1093/bioinformatics/bty1019. PMID: 30541063.
Codeml-SBA: To
detect positive selection at individual amino acid
sites, most methods use an empirical Bayes
approach. A difficulty with this approach is that
parameter estimates with large errors can negatively
impact Bayesian classification. Bayes Empirical
Bayes (BEB) mitigates this problem by imposing uniform
priors, which causes it to be overly conservative in
some cases. When standard regularity conditions are not
met and parameter estimates are unstable, and even under
BEB is negatively impacted. We present an
alternative to BEB called smoothed bootstrap aggregation
(SBA), which bootstraps site patterns from an
alignment of protein coding DNA sequences to accommodate
the uncertainty in the parameter estimates. In in
combination with kernel smoothing techniques, SBA
improves site specific inference of positive selection.
Distribution: https://github.com/Jehops/codeml_sba
Citation: Mingrone J, Susko E, Bielawski J. Smoothed
Bootstrap Aggregation for Assessing Selection Pressure at
Amino Acid Sites. Mol Biol Evol. 2016 Nov;33(11):2976-2989.
doi: 10.1093/molbev/msw160. Epub 2016 Aug 2. PMID: 27486222.
BioMiCo: Microbiome
samples often represent mixtures of communities, where
each community is composed of overlapping assemblages of
species. Such mixtures are complex, the number of
species is huge and abundance information for many
species is often sparse. Classical methods have a
limited value for identifying complex features within
such data. To adress this, we developed a novel
hierarchical model for Bayesian inference of microbial
communities (BioMiCo). BioMiCo provides a
framework for learning the structure of microbial
communities and for making predictions based on
microbial assemblages. By training on carefully chosen
features (abiotic or biotic), BioMiCo can be
used to understand and predict transitions between
complex communities composed of hundreds of microbial
species.
Distribution: https://sourceforge.net/projects/biomico/
Citation: Shafiei M, Dunn KA, Boon E, MacDonald SM, Walsh
DA, Gu H, Bielawski JP. BioMiCo: a supervised Bayesian model
for inference of microbial community structure. Microbiome.
2015 Mar 10;3:8. doi: 10.1186/s40168-015-0073-x. PMID:
25774293; PMCID: PMC4359585.
BiomeNet: Metagenomics
yields enormous numbers of microbial sequences that can
be assigned a metabolic function. Using such data to
infer community-level metabolic divergence is hindered
by the lack of a suitable statistical framework. To
adress this, we developed a novel hierarchical Bayesian
model, called BiomeNet (Bayesian inference of
metabolic networks), for inferring differential
prevalence of metabolic subnetworks among microbial
communities. Through this framework, the model can
capture nested structures within the data. BiomeNet
is unique in modeling each metagenome sample as a
mixture of complex metabolic systems (metabosystems).
Distribution: https://sourceforge.net/projects/biomenet/
Citation: Shafiei M, Dunn KA, Chipman H, Gu H, Bielawski JP.
BiomeNet: a Bayesian model for inference of metabolic
divergence among microbial communities. PLoS Comput Biol.
2014 Nov 20;10(11):e1003918. doi:
10.1371/journal.pcbi.1003918. PMID: 25412107; PMCID:
PMC4238953.
DendroCypher: Modelling
branch specific changes in the intesity of natural
selction pressuer requires identifying and labelling
specific branches within a phylogenetic tree. This
is very diffult to do when the tree is repreosnted in
Newick forta and there are largen nuber of taxa (with
many inetrnal branches). DendroCytper is a
tool for manipulating and labelling a bifurcating tree
data structure.
Distribution: https://bitbucket.org/EvoWorks/dendrocypher
Citation: Bielawski, J.P., Baker, J.L.
and Mingrone, J. 2016. Inference of episodic changes
in natural selection acting on protein coding
sequences via CODEML. Curr. Protoc. Bioinform.
54:6.15.1-6.15.32. doi: 10.1002/cpbi.2
Protein Stabilty
simulation and analysis software: Most proteins must fold
into a native structure in which they are moderately
stable before they are able to perform their biological
function. Protein stability depends on the sequence of
amino acids and their interactions in the folded
three-dimensional structures. Because of these
interactions, evolutionary selective constraints to
maintain adequate stability result in epistatic
dependencies between residues. We developed and
implemented mechanistic mutation-selection models in
conjunction with a fitness framework derived from
protein stability. We refer to these as the
stability-informed site-dependent (S-SD) model
and the stability-informed site-independent (S-SI)
model that captures the average effect of stability
constraints on individual sites of a protein. We
also used the stability-constrained mechanistic
mutation-selection models to show that nonadaptive
evolution can lead to both positive (Stokes) and
negative (anti-Stokes) shifts in propensities
following the fixation of an amino acid, emphasizing
that the detection of negative shifts is not conclusive
evidence of adaptation.
Distribution:
https://github.com/nooryoussef/antiStokes_shifts
https://github.com/nooryoussef/Consequences-of-stability-induced-epistasis
Citations:
Youssef N, Susko E, Roger AJ, Bielawski JP. Evolution of
Amino Acid Propensities under Stability-Mediated Epistasis.
Mol Biol Evol. 2022 Mar 2;39(3):msac030. doi:
10.1093/molbev/msac030. PMID: 35134997; PMCID: PMC8896634.
Youssef N, Susko E,
Bielawski JP. Consequences of Stability-Induced Epistasis
for Substitution Rates. Mol Biol Evol. 2020 Nov
1;37(11):3131-3148. doi: 10.1093/molbev/msaa151. PMID:
32897316.
LiBaC: Classic
likelihood ratio tests for positive selection have a
high false-positive rate in situations when there is
textansive and un-modelled varation in the evoltuaonry
process among sites within a gene. We developed a new
method for assigning codon sites into groups where
each group has a different model, and the likelihood
over all sites is maximized. The method, called
likelihood-based clustering (LiBaC), can be
viewed as a generalization of the family of
model-based clustering approaches to models of codon
evolution.
Distribution: available on request
Citation: Bao L, Gu H, Dunn KA, Bielawski JP.
Likelihood-based clustering (LiBaC) for codon models, a
method for grouping sites according to similarities in the
underlying process of evolution. Mol Biol Evol. 2008
Sep;25(9):1995-2007. doi: 10.1093/molbev/msn145. Epub 2008
Jun 26. PMID: 18586695.
Codeml-FE: In some
cases, a priori biological knowledge has been used
successfully to model heterogeneous evolutionary
dynamics among codon sites. These are called fixed-effect
models, and they require that all codon sites
are assigned to one of several partitions which are
permitted to have independent parameters for selection
pressure, evolutionary rate, transition to
transversion ratio or codon frequencies. Codeml-FE
implements 11 new fixed effect models of codon
evolution.
Distribution: available on request
Citation: Bao L, Gu H, Dunn KA, Bielawski JP. Methods for
selecting fixed-effect models for heterogeneous codon
evolution, with comments on their application to gene and
genome data. BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl
1):S5. doi: 10.1186/1471-2148-7-S1-S5. PMID: 17288578;
PMCID: PMC1796614.
PAML "lab" in the Workshop on Molecular
Evolution at MBL: Tutorial for
the PAML lab exercises for the Workshop on Molecualr
evolution at the Marine Bilogcail Laboratories (MBL) in
Woods Hole, MA.
Distribution: https://bitbucket.org/EvoWorks/protocol-paml-lab-at-mbl-workshop/src/master/
Citations:
https://www.mbl.edu/education/courses/workshop-on-molecular-evolution/
https://molevolworkshop.github.io/faculty-bielawski/
https://awarnach.mathstat.dal.ca/~joeb/PAML_lab_old/lab.html
Yang Z, Bielawski JP. Statistical methods for detecting
molecular adaptation. Trends Ecol Evol. 2000 Dec
1;15(12):496-503. doi: 10.1016/s0169-5347(00)01994-7.
PMID: 11114436; PMCID: PMC7134603.
"Advanced PAML lab":
This tutorial adesses some advanced topics
including (1) infernce of episodic seelction pressure,
(2) detection of statistcial irregularities and unstable
paramater estimates, (3) use of smoothed booststap
aggregation to mitigate negative effects of unstable paramater
estimates.
fAdvanced stundets will do this as the "PAML lab"
exercises for the Workshop on Molecualr evolution at the
Marine Bilogcail Laboratories (MBL) in Woods Hole, MA.
Distribution: https://bitbucket.org/EvoWorks/protocol-inference-of-episodic-selection/src/master/
Citations:
Bielawski, J.P., Baker, J.L. and Mingrone,
J. 2016. Inference of episodic changes in natural
selection acting on protein coding sequences via CODEML.
Curr. Protoc. Bioinform. 54:6.15.1-6.15.32. doi:
10.1002/cpbi.2
Mingrone J,
Susko E, Bielawski J. Smoothed Bootstrap Aggregation for
Assessing Selection Pressure at Amino Acid Sites. Mol
Biol Evol. 2016 Nov;33(11):2976-2989. doi:
10.1093/molbev/msw160. Epub 2016 Aug 2. PMID: 27486222.
Protocols: Detecting signatures of adaptive
evolution: This is a
more formal, protocol-based, tutorial on statitstical
inference of adaptive evoluton using codon models. The
topics of the same as coved in the MBL "Paml Lab" but using
differnt datsets and presented in an updated publication.
Distribution: https://bitbucket.org/EvoWorks/protocol-detecting-signatures-of-adaptive-evolution/src/master/
Citations: Bielawski JP.
Detecting the signatures of adaptive evolution in
protein-coding genes. Curr Protoc Mol Biol. 2013
Jan;Chapter 19:Unit 19.1.. doi:
10.1002/0471142727.mb1901s101. PMID: 23288462.
Bayesian Inference of Microbial Community
Structure from Metagenomic Data Using BioMiCo: This tutorial, and the associated book chapter,
provides a set of protocols that illustrate the application
of BioMiCo to real inference problems. Each protocol is
designed around the analysis of a real dataset, which was
carefully chosen to illustrate specific aspects of real data
analysis. With these protocols, users of BioMiCo will be
able to undertake basic research into the properties of
complex microbial systems, as well as develop predictive
models for applied microbiomics.
Distribution: (upcomming) The
tutorial is currently availibe with electronic acess
to the book
Citation: Dunn KA, Andrews K, Bashwih RO, Bielawski JP.
Bayesian Inference of Microbial Community Structure from
Metagenomic Data Using BioMiCo. Methods Mol Biol.
2018;1849:267-289. doi: 10.1007/978-1-4939-8728-3_17. PMID:
30298260.
PG-BSM Tutorial: A tutorial based on a set of real data is included
in the dsitrbution of the PG-BSM software. The software
download will aoutomatically include all the required
supplementalry files. Note that this tutotrial require
MATLAB.
Distribution: https://www.mathstat.dal.ca/~tsusko/software.html
Citation: Jones CT, Youssef N, Susko E, Bielawski JP. A
Phenotype-Genotype Codon Model for Detecting Adaptive
Evolution. Syst Biol. 2020 Jul 1;69(4):722-738. doi:
10.1093/sysbio/syz075. PMID: 31730199.
(
back
to top)
Awarnach
Awarnach is the Bielawski group's computing cluster. It
originally ran 64bit Solaris, but was updated in 2020 to
run Linux.
Hardware and General Configuration
Awarnach has a Sun Fire X40z master node (two dual
core Opteron 2.0GHz 870 CPUs and 16GB of RAM).
Currently there there are 20 compute nodes, 19 with
dual core Opteron 270 CPUs (2.2GHz), 4GB of RAM, and
two fast, hardware mirrored 73GB disks and one “super
node” with four 12-core 6348 CPUs and 256 GB of Ram.
So, there are a total of 19*2*2 + 1*12*4 = 124 cores.
Storage space is NFS mounted from our storage server:
an Asus RS300-E7-PS4 1U Server with an E3-1230V2 Xeon
CPU, four Intel 60GB SSD (520 Series) and an LSI
9205-8e SAS controller connecting to a Supermicro
SC847E16-RJB0D1 JBOD containing 10 WD30EFRX 2TB Hard
Drives in a ZFS raidz3 pool.
Storage Space, NFS and AMD
Each user's home directory is located on the storage
server and nfs-mounted from the nodes.
Job Scheduling
Job scheduling is handled with Grid
Engine.
The Name Awarnach
The name Awarnach comes from the name of a giant in
Arthurian legend.on their application to gene and
genome data. BMC Evol
(
back to top)