Last update: Apr 12, 2017, Contributors: Minh Bui
This guide gives developers an overview of IQ-TREE software design, data structures and discusses possibility to incorporate new models into IQ-TREE.
NOTICE: This guide is still under preparation, thus the contents may change frequently.
To achieve both high performance and flexibility, IQ-TREE software has been entirely written in object oriented C++. Thus, it faciliates extending with new sequence data types or new models. IQ-TREE code consists of C++ classes, most of which inherits from three basal classes: Alignment
, ModelSubst
and PhyloTree
to handle sequence alignments, models of substitution and phylogenetic trees, respectively. In the following we introduce these basal classes.
TIP: IQ-TREE extensively uses Standard Template Library (STL) in C++. Thus, be first familiar with STL and fundamental STL data structures like
string
,vector
,set
andmap
.
The Alignment
class stores the data as a vector
of Pattern
. Each Pattern
is in turn a string
representing the characters across the sequences at an alignment site, with a frequency
of occurrences in the Alignment
(from header file pattern.h
):
/** Site-patterns in a multiple sequence alignment */ class Pattern : public string { public: ... /** frequency appearance of the pattern */ int frequency; };
The rationale for storing the data this way (instead of storing a set of sequences) is that most computations are carried out along the site-patterns of the Alignment
. Thus, it makes all operations more convenient and faster.
As noted above, the Alignment
class is defined as (from header file alignment.h):
/** Multiple Sequence Alignment. Stored by a vector of site-patterns */ class Alignment : public vector<Pattern> { public: /** constructor @param filename file name @param sequence_type type of the sequence, either "BIN", "DNA", "AA", or NULL @param intype (OUT) input format of the file */ Alignment(char *filename, char *sequence_type, InputType &intype); ... };
NOTE: Please follow the commenting style of the code when declaring new components (classes, functions or variables) like the example above. That way, the source code documentation can be generated with tools like Doxygen. See Doxygen commenting manual for more details.
ModelSubst
is the base class for all substitution models implemented in IQ-TREE. It implements the basic Juke-Cantor-type model (equal substitution rates and equal state frequencies) that works for all data type. ModelSubst
class declares a number of virtual
methods, that need to be overriden when implementing a new model, for example (from header file model/modelsubst.h):
/** Substitution model abstract class */ class ModelSubst: public Optimization { public: /** constructor @param nstates number of states, e.g. 4 for DNA, 20 for proteins. */ ModelSubst(int nstates); /** @return the number of dimensions */ virtual int getNDim() { return 0; } ... };
As an example, the method getNDim()
should return the number of free parameters of the model, which is 0 for the default JC-type model.
ModelGTR
class extends ModelSubst
and implements the general time reversible model. ModelGTR
is the base class for all models currently used in IQ-TREE. Some important ingredients of ModelGTR
(from model/modelgtr.h):
/** General Time Reversible (GTR) model of substitution. This works for all kind of data, not only DNA */ class ModelGTR : public ModelSubst, public EigenDecomposition { public: /** constructor @param tree associated tree for the model */ ModelGTR(PhyloTree *tree, bool count_rates = true); /** @return the number of dimensions */ virtual int getNDim(); ... protected: /** rates between pairs of states of the unit rate matrix Q. In order A-C, A-G, A-T, C-G, C-T (rate G-T = 1 always) */ double *rates; .... };
PhyloTree
is the base class for phylogenetic trees.
Assessing Phylogenetic Assumptions
Simulating sequence alignments
Estimating amino acid substitution models
Developer Guide