The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life Journal Article


Authors: Lienau, E. Kurt; Desalle, Rob; Allard, Marc; Brown, Eric W.; Swofford, David; Rosenfeld, Jeffrey A.; Sarkar, Indra N.; Planet, Paul J.
Article Title: The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life
Abstract: Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. (C) The Willi Hennig Society 2010.
Keywords: Evolutionary Biology; PHYLOGENETIC ANALYSIS; ORIGIN; ORGANISMS; RECONSTRUCTION; DATA SETS; CORE; RNA-POLYMERASE; PROKARYOTES; BACTERIAL PHYLOGENY; UNIVERSAL ANCESTOR
Journal Title: Cladistics
Volume: 27
Issue: 4
ISSN: 0748-3007
Publisher: Blackwell Publishing  
Publication Place: MALDEN; COMMERCE PLACE, 350 MAIN ST, MALDEN 02148, MA USA
Date Published: 2011-01-01
Start Page: 417
End Page: 427
Language: English
DOI/URL:
Notes: PT: J; NR: 63; TC: 0; J9: CLADISTICS; PG: 11; GA: 790AY; UT: WOS:000292562100008