Why use PartitionFinder?
The accuracy of phylogenetic inference is influenced by the first few steps: selecting a partitioning scheme and a model (or models) of nucleotide evolution. Unfortunately, it's difficult to know how multilocus data should be partitioned. Many biologists rely on intuition or some form of biological justification, for example:
(3 protein coding genes) x (3 codon positions) = 9 partitions
However, studies have shown that this approach often results in overparameterization (Brandley et al. 2005; Brown and Lemmon 2007; McGuire et al. 2007; Li et al. 2008). For example, it might be that the third codon position across all genes has a similar rate and pattern of substitution, so these might be better analyzed together. This is where PartitionFinder helps.
Input and output files
You will need two input files:
1) your data (in phylip format)
2) a configuration file. Example: partition_finder.cfg
In this configuration file I have asked partitionFinder to evaluate only the models of nucleotide evolution available in MrBayes; I have asked it to perform model selection using BIC; I have defined my possible partitions; and I have asked it to use the "greedy" algorithm to search the partition space.
The output is a log file and a folder called "analysis." Inside the analysis folder is a file called best_scheme.txt -- this is probably what you want. In my output file it tells me that the best partitioning scheme contains 4 partitions:
1) mitochondrial gene codons 1 and 2 (nucleotide model HKY+I+G)
2) mitochondrial gene codon 3 (GTR+I+G)
3) most of my nuclear genes codons 1 and 2 (GTR+I+G)
4) most of my nuclear genes codon 3 (GTR+G)
Bonus points because this partitioning scheme makes biological sense to me.
My opinion of PartitionFinder
Love it. PartitionFinder is flexible, easy to use, and it's a REALLY good idea. It's a python script, so it's executed in the terminal. The Lanfear lab website has a really good tutorial that walks you through everything, even if you're not comfortable on the command line.
On two occasions PartitionFinder froze while I was running a 10-gene (3 codons each) analysis using the greedy algorithm and models set to "all". I was using a desktop iMac. However, the same dataset worked fine when I used a reduced set of models (setting models to "MrBayes"), and when I invoked the --raxml option.
Happy partitioning!