Speciation in tail-dropper slugs

Taildropper slugs (genus Prophysaon) are endemic to the temperate rainforests of the Pacific Northwest (Figure 1A). There are nine described species, and the group appears to have a complex history with the potential for geology, climate, and ecology to have driven diversification. During my dissertation, I developed a novel approach to delimit species while considering population-level processes (delimitR) and inferred a likely history of divergence in isolated refugia during glaciation, followed by expansion and gene flow between lineages upon secondary contact in several species (Smith & Carstens 2020, Smith et al. in review A). In ongoing research, I am collecting genomic and transcriptomic data from this system. With these data, I hope to learn not only whether and when gene flow has occurred between species, but also which regions of the genome have introgressed across species and population boundaries.

Figure 1: Sister species P. andersoni and P. foliolatum have partially overlapping ranges and differ in ecologically important traits, like microhabitat, foot size, and dentition. A) Map of the Pacific Northwest showing glacial extent during the Last Glacial Maximum (blue), the range of P. andersoni (orange), and the range of P. foliolatum (purple). B) P. andersoni and C) P. foliolatum feeding on mushrooms. D) Comparisons of dentition between P. andersoni and P. foliolatum (Pilsbry & Vanatta 1898).

Machine learning in population genetics and phylogenetics

Machine learning approaches are increasingly being applied to answer interesting questions in population genetics and phylogenetics. During my dissertation, I developed delimitR, and approach to delimit species in the presence of population-level processes using machine learning and genomic data. In ongoing work, I am evaluating the power of machine learning approaches to incorporate complex processes, like background selection, that may mislead traditional population genetics approaches. I’m also interested in applying machine learning to questions in phylogenetics.

Expanding the data used in phylogenetics

Traditionally, orthologs, or genes related through speciation events, have been the gold standard for phylogenetic inference. This has led to stringent filtering to remove paralogs, or genes related through duplication events, but this can severely limit the amount of data available (Figure 2). To investigate the potential benefits and risks of using paralogs in phylogenetics, I have combined simulations (Yan et al. 2021), theoretical work (Smith & Hahn 2021), empirical investigation (Smith et al. 2021), and literature reviews (Smith & Hahn 2020). This work has highlighted the robustness of phylogenetic inference to the inclusion of paralogs and suggests steps towards including more data in phylogenetic analyses.

Figure 2: Gene duplication and loss can lead to gene tree heterogeneity. After duplication, two copies evolve (green and pink). Gene copies can be orthologous (i.e. share a common ancestor due to speciation), or paralogous (i.e. share a common ancestor due to duplication). The green copy in species A is orthologous to the green copy in species B. The red copy in species A is paralogous to the green copy in species A.

Using predictive phylogeography to study community responses to environmental change

In addition to studying the factors that drive diversification within and between taxa, I am interested in better understanding the processes that drive community responses to environmental change. The field of comparative phylogeography aims to understand how communities of different species respond to geologic and climatic events. While inferring whether individual species have responded similarly or idiosyncratically is of interest, the ultimate goal is to better understand the factors that drive species responses to environmental change, and this requires an integration across genetic, ecological, and phenotypic data. During my PhD, I collaborated on the development of a framework for predictive phylogeography (Espíndola et al. 2016, Sullivan et al. 2019). Predictive phylogeography leverages data collected across taxa to make predictions about unstudied taxa and to identify factors predictive of species’ responses. We integrated across genetic, environmental, and phenotypic data using supervised machine learning and gained novel insights into potential factors driving species responses. As with any predictive model, our predictions are only as good as the data used to build the model. When data are sampled from a biased subset of taxa, the predictive power of our model will be limited. During my dissertation, I substantially expanded the number of taxa with data available from the Pacific Northwest, and used these data to iteratively update our predictive model (Smith et al. 2018). I contributed to projects in invertebrates (Smith et al. 2017, Smith et al. 2018, Smith & Carstens 2020, Smith et al. in review A, Rankin et al. 2019, Lado et al. 2020), plants (Ruffley et al. 2018, Ruffley et al. 2021), and vertebrates (Warwick et al. 2020) from the Pacific Northwest and other regions. I am interested in using leaf-litter sampling to expand taxonomic sampling to leaf-litter invertebrates, including microsnails, from the region (Figure 3). Given the success of barcoding leaf-litter invertebrates using mitochondrial markers (Smith et al., in review B), I am hopeful that environmental DNA (eDNA) could help to improve taxonomic sampling in the future.

Figure 3: Sampling of leaf litter and invertebrates from the Pacific Northwest of North America in 2017. A) Map of sampling localities across the Pacific Northwest. NC: North Cascades, BW: Blue and Wallowa Mountains, SC: South Cascades, NRM: Northern Rocky Mountains, VI: Vancouver Island. B) Pie chart showing the classes of the sequenced invertebrates, based on BLASTn results. C) Photograph of a Punctum randolphii sample. D) Photograph of Columella edentula sample.