Recently I was asked by a researcher for a family-level phylogenetic tree of ferns. The Fern Tree of Life (FTOL) project that I maintain generates a maximally sampled global fern phylogeny, but it is at the species level. So how can we go from that to a family-level tree?
Basically it involves the following steps:
Load a list of all species in the tree and the name of the family each belongs to.
Check that each family is monophyletic or monotypic (this must be true for the next step to work1).
Select a single exemplar species for each family (this could be any species within the family, as long as the family is monophyletic).
Trim the tree to only the exemplar species (one per family).
A few packages used here bear extra mention. The MonoPhy package is great at doing exactly what the name would suggest: checking for monophyly. I am a huge fan of the assertr package for proactive assertion about data. In this case, the code would fail (issue an error) if the assumption of monophyletic/monotypic families did not hold. Finally, the ftolr package by yours truly provides the most recent fern tree and associated taxonomic data.
Of course, this approach should work for any tree assuming the two requirements are met (the higher level taxa are all monophyletic or monotypic and the tree is ultrametric).
# Load packageslibrary(tidyverse)library(ftolr)library(ape)library(MonoPhy)library(assertr)# Check FTOL version and cutoff dateft_data_ver()
# Analyze monophyly of each familyfamily_mono_test <-AssessMonophyly( phy,as.data.frame(taxonomy[, c("species", "family")]))# Check that all families are monophyletic or monotypicfamily_mono_summary <- family_mono_test$family$result %>%rownames_to_column("family") %>%as_tibble() %>%assert(in_set("Yes", "Monotypic"), Monophyly)# Inspect:family_mono_summary
# A tibble: 48 × 9
family Monophyly MRCA `#Tips` `Delta-Tips` `#Intruders` Intruders `#Outliers` Outliers
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Pteridaceae Yes 5760 897 0 0 "" NA ""
2 Polypodiaceae Yes 6666 935 0 0 "" NA ""
3 Davalliaceae Yes 7600 42 0 0 "" NA ""
4 Oleandraceae Yes 7641 10 0 0 "" NA ""
5 Tectariaceae Yes 7650 138 0 0 "" NA ""
6 Nephrolepidaceae Yes 7787 19 0 0 "" NA ""
7 Lomariopsidaceae Yes 7805 44 0 0 "" NA ""
8 Dryopteridaceae Yes 7848 984 0 0 "" NA ""
9 Didymochlaenaceae Yes 8831 9 0 0 "" NA ""
10 Hypodematiaceae Yes 8839 26 0 0 "" NA ""
# ℹ 38 more rows
# Get one exemplar tip (species) per familyrep_tips <- taxonomy %>%group_by(family) %>%slice(1) %>%ungroup()# Subset phylogeny to one tip per familyphy_family <- ape::keep.tip(phy, rep_tips$species)# Relabel with family namesnew_tips <-tibble(species = phy_family$tip.label) %>%left_join(rep_tips, by ="species") %>%pull(family)phy_family$tip.label <- new_tips# Visualize treeplot(ladderize(phy_family), no.margin =TRUE)
Another condition of this approach is that that the tree must be ultrametric (all tips at the same depth). If that is not true, then the choice of exemplar species would affect the branchlengths in the family-level tree.↩︎