The genetic distances between strains were estimated with the software Dnadist by employing the F84 nucleotide substitution model [79]. The NJ tree was inferred with the Neighbour software, in the Phylip package [76]. By using the software jModelTest [80], we were able to evaluate alternative nucleotide substitution models for the maximum likelihood Proteasomal inhibitor analysis and perform model averaging [81], in which the alternative models were weighted based on the fit to
the data and model complexity (i.e. the number of effective parameters in each substitution model) using the Bayesian information criterion (BIC) [82]. Substitution models with unequal base frequencies, a proportion of invariable JNK-IN-8 datasheet sites, α, and allowance Milciclib cell line for rate variation among sites, Г, were included. The number of discrete gamma categories was 4. In total, we
considered 24 alternative substitution models in the model-averaging process. The more computationally intense ML procedure was chosen to estimate phylogenies in the single-marker analysis, whereas the rapid NJ method was utilised in the multiple marker analyses. The whole-genome phylogeny was estimated with both the ML and NJ methods by considering 20,072 SNPs on the core genome of all 37 genomes. The SNPs were obtained using the same procedure as in [3], where the Mauve software [83] with default options was used to perform multiple genome alignment and in-house perl-script was used to identify the SNPs based on the obtained Liothyronine Sodium alignments. As both ML and NJ methods resulted in virtually identical phylogenies, we concluded that the choice of estimation method did not have a significant impact on the evaluation of the sequence-marker topologies. Phylogenetic-topology comparison To check for and quantify the degree of compatibility between the phylogenetic trees estimated with marker-sequence data and the whole-genome tree (i.e. two trees with nested taxa), bipartitions in the marker tree were checked for their presence/absence in the whole-genome tree.
In trees with missing sequences, the corresponding leaves were removed from the whole-genome tree using the R package ape [84]. The output, i.e. number of absent bipartitions, were normalised by the total number of bipartitions in the marker tree. This topology metric was denoted inc throughout the study. For perfectly compatible trees, no bipartitions in the marker tree should be absent in the whole-genome tree. To obtain the bipartitions at the internal edges of the trees, the output from the Consense software in the Phylip package [78], together with an in-house Perl script (available upon request), were used. The inc metric is similar to the RF distance [26], although the RF metric counts the number of bipartitions not present in the other tree for both trees. Therefore, the RF metric measures both the degree of incongruence and the difference in resolution between reference and alternative topologies.