‘Multi-SpaM’: a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees

Dencker, Thomas; Leimeister, Chris-Andre; Gerth, Michael; Bleidorn, Christoph; Snir, Sagi; Morgenstern, Burkhard

Journal Article

‘Multi-SpaM’: a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees

Abstract

Word-based or ‘alignment-free’ methods for phylogeny inference have become popular in recent years. These methods are much faster than traditional, alignment-based approaches, but they are generally less accurate. Most alignment-free methods calculate ‘pairwise’ distances between nucleicacid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining. In this paper, we propose the first word-based phylogeny approach that is based on ‘multiple’ sequence comparison and ‘maximum likelihood’. Our algorithm first samples small, gap-free alignments involving four taxa each. For each of these alignments, it then calculates a quartet tree and, finally, the program ‘Quartet MaxCut’ is used to infer a super tree for the full set of input taxa from the calculated quartet trees. Experimental results show that trees produced with our approach are of high quality.