A few months ago I submitted my first chapter for publication, which included a concatenated analysis of 10 loci. I used ModelTest to select a model of nucleotide substitution for each gene, then placed each gene in its own partition for the concatenated analysis. However, a reviewer suggested that I use PartitionFinder (Lanfear et al. 2012) instead to determine the optimal partitioning scheme for my concatenated analysis.
Why use PartitionFinder?
The accuracy of phylogenetic inference is influenced by the first few steps: selecting a partitioning scheme and a model (or models) of nucleotide evolution. Unfortunately, it's difficult to know how multilocus data should be partitioned. Many biologists rely on intuition or some form of biological justification, for example:
(3 protein coding genes) x (3 codon positions) = 9 partitions
However, studies have shown that this approach often results in overparameterization (Brandley et al. 2005; Brown and Lemmon 2007; McGuire et al. 2007; Li et al. 2008). For example, it might be that the third codon position across all genes has a similar rate and pattern of substitution, so these might be better analyzed together. This is where PartitionFinder helps.
Input and output files
You will need two input files:
1) your data (in phylip format)
2) a configuration file. Example: partition_finder.cfg
In this configuration file I have asked partitionFinder to evaluate only the models of nucleotide evolution available in MrBayes; I have asked it to perform model selection using BIC; I have defined my possible partitions; and I have asked it to use the "greedy" algorithm to search the partition space.
The output is a log file and a folder called "analysis." Inside the analysis folder is a file called best_scheme.txt -- this is probably what you want. In my output file it tells me that the best partitioning scheme contains 4 partitions:
1) mitochondrial gene codons 1 and 2 (nucleotide model HKY+I+G)
2) mitochondrial gene codon 3 (GTR+I+G)
3) most of my nuclear genes codons 1 and 2 (GTR+I+G)
4) most of my nuclear genes codon 3 (GTR+G)
Bonus points because this partitioning scheme makes biological sense to me.
My opinion of PartitionFinder
Love it. PartitionFinder is flexible, easy to use, and it's a REALLY good idea. It's a python script, so it's executed in the terminal. The Lanfear lab website has a really good tutorial that walks you through everything, even if you're not comfortable on the command line.
On two occasions PartitionFinder froze while I was running a 10-gene (3 codons each) analysis using the greedy algorithm and models set to "all". I was using a desktop iMac. However, the same dataset worked fine when I used a reduced set of models (setting models to "MrBayes"), and when I invoked the --raxml option.
Happy partitioning!
Why use PartitionFinder?
The accuracy of phylogenetic inference is influenced by the first few steps: selecting a partitioning scheme and a model (or models) of nucleotide evolution. Unfortunately, it's difficult to know how multilocus data should be partitioned. Many biologists rely on intuition or some form of biological justification, for example:
(3 protein coding genes) x (3 codon positions) = 9 partitions
However, studies have shown that this approach often results in overparameterization (Brandley et al. 2005; Brown and Lemmon 2007; McGuire et al. 2007; Li et al. 2008). For example, it might be that the third codon position across all genes has a similar rate and pattern of substitution, so these might be better analyzed together. This is where PartitionFinder helps.
Input and output files
You will need two input files:
1) your data (in phylip format)
2) a configuration file. Example: partition_finder.cfg
In this configuration file I have asked partitionFinder to evaluate only the models of nucleotide evolution available in MrBayes; I have asked it to perform model selection using BIC; I have defined my possible partitions; and I have asked it to use the "greedy" algorithm to search the partition space.
The output is a log file and a folder called "analysis." Inside the analysis folder is a file called best_scheme.txt -- this is probably what you want. In my output file it tells me that the best partitioning scheme contains 4 partitions:
1) mitochondrial gene codons 1 and 2 (nucleotide model HKY+I+G)
2) mitochondrial gene codon 3 (GTR+I+G)
3) most of my nuclear genes codons 1 and 2 (GTR+I+G)
4) most of my nuclear genes codon 3 (GTR+G)
Bonus points because this partitioning scheme makes biological sense to me.
My opinion of PartitionFinder
Love it. PartitionFinder is flexible, easy to use, and it's a REALLY good idea. It's a python script, so it's executed in the terminal. The Lanfear lab website has a really good tutorial that walks you through everything, even if you're not comfortable on the command line.
On two occasions PartitionFinder froze while I was running a 10-gene (3 codons each) analysis using the greedy algorithm and models set to "all". I was using a desktop iMac. However, the same dataset worked fine when I used a reduced set of models (setting models to "MrBayes"), and when I invoked the --raxml option.
Happy partitioning!