Construction of Habitat-Specific Training Sets to Achieve Species-Level Assignment in 16S rRNA Gene Datasets

Isabel F. Escapa, Yanmei Huang, Tsute Chen, Maoxuan Li, Alexis Kokaras, Floyd E. Dewhirst, and Katherine P. Lemon
The low cost of 16S rRNA gene sequencing facilitates population-scale molecular epidemiological studies. Existing computational algorithms can resolve 16S rRNA gene sequences into high-resolution amplicon sequence variants (ASVs), which represent consistent labels comparable across studies. Assigning these ASVs to species-level taxonomy strengthens the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies and further facilitates data comparison across studies.

To achieve this, we developed a broadly applicable method for constructing high-resolution training sets based on the phylogenic relationships among microbes found in a habitat of interest. When used with the naïve Bayesian Ribosomal Database Project (RDP) Classifier, this training set achieved species/supraspecies-level taxonomic assignment of 16S rRNA gene-derived ASVs. The key steps for generating such a training set are (1) constructing an accurate and comprehensive phylogenetic-based, habitat-specific database; (2) compiling multiple 16S rRNA gene sequences to represent the natural sequence variability of each taxon in the database; (3) trimming the training set to match the sequenced regions, if necessary; and (4) placing species sharing closely related sequences into a training-set-specific supraspecies taxonomic level to preserve subgenus-level resolution. As proof of principle, we developed a V1–V3 region training set for the bacterial microbiota of the human aerodigestive tract using the full-length 16S rRNA gene reference sequences compiled in our expanded Human Oral Microbiome Database (eHOMD). We also overcame technical limitations to successfully use Illumina sequences for the 16S rRNA gene V1–V3 region, the most informative segment for classifying bacteria native to the human aerodigestive tract. Finally, we generated a full-length eHOMD 16S rRNA gene training set, which we used in conjunction with an independent PacBio single molecule, real-time (SMRT)-sequenced sinonasal dataset to validate the representation of species in our training set. This also established the effectiveness of a full-length training set for assigning taxonomy of long-read 16S rRNA gene datasets.

Here, we present a systematic approach for constructing a phylogeny-based, high-resolution, habitat-specific training set that permits species/supraspecies-level taxonomic assignment to short- and long-read 16S rRNA gene-derived ASVs. This advancement enhances the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies.

Epigenome-Metabolome-Microbiome Axis in Health and IBD

Hajera Amatullah and Kate Jeffrey.
Environmental triggers in the context of genetic susceptibility drive phenotypes of complex immune disorders such as Inflammatory bowel disease (IBD). One such trigger of IBD is perturbations in enteric commensal bacteria, fungi or viruses that shape both immune and neuronal state. The epigenome acts as an interface between microbiota and context-specific gene expression and is thus emerging as a third key contributor to IBD. Here we review evidence that the host epigenome plays a significant role in orchestrating the bidirectional crosstalk between mammals and their commensal microorganisms. We discuss disruption of chromatin regulatory regions and epigenetic enzyme mutants as a causative factor in IBD patients and mouse models of intestinal inflammation and consider the possible translation of this knowledge. Furthermore, we present emerging insights into the intricate connection between the microbiome and epigenetic enzyme activity via host or bacterial metabolites and how these interactions fine-tune the microorganism-host relationship.

Illuminating the Human Virome in Health and Disease

Fatemeh Adiliaghdam and Kate L Jeffrey
Although the microbiome is established as an important regulator of health and disease, the role of viruses that inhabit asymptomatic humans (collectively, the virome) is less defined. While we are still characterizing what constitutes a healthy or diseased virome, an exciting next step is to move beyond correlations and toward identification of specific viruses and their precise mechanisms of beneficial or harmful immunomodulation. Illuminating this will represent a first step toward developing virome-focused therapies.

Metapangenomics of the Oral Microbiome Provides Insights into Habitat Adaptation and Cultivar Diversity

Daniel R Utter, Gary G Borisy, Murat A Eren, Colleen M Cavanaugh, and Jessica Mark L Welch
Background: The increasing availability of microbial genomes and environmental shotgun metagenomes provides unprecedented access to the genomic differences within related bacteria. The human oral microbiome with its diverse habitats and abundant, relatively well-characterized microbial inhabitants presents an opportunity to investigate bacterial population structures at an ecosystem scale.

Results: Here, we employ a metapangenomic approach that combines public genomes with Human Microbiome Project (HMP) metagenomes to study the diversity of microbial residents of three oral habitats: tongue dorsum, buccal mucosa, and supragingival plaque. For two exemplar taxa, Haemophilus parainfluenzae and the genus Rothia, metapangenomes reveal distinct genomic groups based on shared genome content. H. parainfluenzae genomes separate into three distinct subgroups with differential abundance between oral habitats. Functional enrichment analyses identify an operon encoding oxaloacetate decarboxylase as diagnostic for the tongue-abundant subgroup. For the genus Rothia, grouping by shared genome content recapitulates species-level taxonomy and habitat preferences. However, while most R. mucilaginosa are restricted to the tongue as expected, two genomes represent a cryptic population of R. mucilaginosa in many buccal mucosa samples. For both H. parainfluenzae and the genus Rothia, we identify not only limitations in the ability of cultivated organisms to represent populations in their native environment, but also specifically which cultivar gene sequences are absent or ubiquitous.

Conclusions: Our findings provide insights into population structure and biogeography in the mouth and form specific hypotheses about habitat adaptation. These results illustrate the power of combining metagenomes and pangenomes to investigate the ecology and evolution of bacteria across analytical scales.

New Insights into Human Nostril Microbiome from the Expanded Human Oral Microbiome Database (eHOMD): A Resource for the Microbiome of the Human Aerodigestive Tract

Escapa IF, Chen T, Huang Y, Gajare P, Dewhirst FE, and Lemon KP
The expanded Human Oral Microbiome Database (eHOMD) is a comprehensive microbiome database for sites along the human aerodigestive tract that revealed new insights into the nostril microbiome. The eHOMD provides well-curated 16S rRNA gene reference sequences linked to available genomes and enables assignment of species-level taxonomy to most next-generation sequences derived from diverse aerodigestive tract sites, including the nasal passages, sinuses, throat, esophagus, and mouth. Using minimum entropy decomposition coupled with the RDP Classifier and our eHOMD V1-V3 training set, we reanalyzed 16S rRNA V1-V3 sequences from the nostrils of 210 Human Microbiome Project participants at the species level, revealing four key insights. First, we discovered that Lawsonella clevelandensis, a recently named bacterium, and Neisseriaceae [G-1] HMT-174, a previously unrecognized bacterium, are common in adult nostrils. Second, just 19 species accounted for 90% of the total sequences from all participants. Third, 1 of these 19 species belonged to a currently uncultivated genus. Fourth, for 94% of the participants, 2 to 10 species constituted 90% of their sequences, indicating that the nostril microbiome may be represented by limited consortia. These insights highlight the strengths of the nostril microbiome as a model system for studying interspecies interactions and microbiome function. Also, in this cohort, three common nasal species (Dolosigranulum pigrum and two Corynebacterium species) showed positive differential abundance when the pathobiont Staphylococcus aureus was absent, generating hypotheses regarding colonization resistance. By facilitating species-level taxonomic assignment to microbes from the human aerodigestive tract, the eHOMD is a vital resource enhancing clinical relevance of microbiome studies. IMPORTANCE The eHOMD ( is a valuable resource for researchers, from basic to clinical, who study the microbiomes and the individual microbes in body sites in the human aerodigestive tract, which includes the nasal passages, sinuses, throat, esophagus, and mouth, and the lower respiratory tract, in health and disease. The eHOMD is an actively curated, web-based, open-access resource. eHOMD provides the following: (i) species-level taxonomy based on grouping 16S rRNA gene sequences at 98.5% identity, (ii) a systematic naming scheme for unnamed and/or uncultivated microbial taxa, (iii) reference genomes to facilitate metagenomic, metatranscriptomic, and proteomic studies and (iv) convenient cross-links to other databases (e.g., PubMed and Entrez). By facilitating the assignment of species names to sequences, the eHOMD is a vital resource for enhancing the clinical relevance of 16S rRNA gene-based microbiome studies, as well as metagenomic studies.

Trained Immunity, Tolerance, Priming and Differentiation: Distinct Immunological Processes

Maziar Divangahi, Peter Aaby, Shabaana Abdul Khader, Luis B. Barreiro, Siroon Bekkering, Triantafyllos Chavakis, Reinout van Crevel, Nigel Curtis, Andrew R. DiNardo, Jorge Dominguez-Andres, Raphael Duivenwoorden, Stephanie Fanucchi, Zahi Fayad, Elaine Fuchs, Melanie Hamon, Kate L. Jeffrey, Nargis Khan, Leo A. B. Joosten, Eva Kaufmann, Eicke Latz, Giuseppe Matarese, Jos W. M. van der Meer, Musa Mhlanga, Simone J. C. F. M. Moorlag, Willem J. M. Mulder, Shruti Naik, Boris Novakovic, Luke O’Neill, Jordi Ochando, Keiko Ozato, Niels P. Riksen, Robert Sauerwein, Edward R. Sherwood, Andreas Schlitzer, Joachim L. Schultze, Michael H. Sieweke, Christine Stabell Benn, Henk Stunnenberg, Joseph Sun, Frank L. van de Veerdonk, Sebastian Weis, David L. Williams, Ramnik Xavier & Mihai G. Netea
The similarities and differences between trained immunity and other immune processes are the subject of intense interrogation. Therefore, a consensus on the definition of trained immunity in both in vitro and in vivo settings, as well as in experimental models and human subjects, is necessary for advancing this field of research. Here we aim to establish a common framework that describes the experimental standards for defining trained immunity.