Introduction
As the field of biotechnology pushes through another decade since its conception, the understanding of the microscopic world and how it intercalates with the surrounding macroscopic environment continues to develop at a staggering rate. In the wake of NGS technologies, genetic sequencing continues to become more robust, affordable, and accessible. This ease-of-access sequencing opens many avenues for researchers to understand relationships within microbiomes and facilitates a growing field in the world of bioinformatics as terabytes of data are continually generated and analyzed.
The two most notable short-read NGS techniques within the scientific community for producing high-quality, comparable, and reproducible data are Targeted 16S rRNA gene sequencing and Shotgun Metagenomic sequencing. While each workflow possesses key features that distinguish it from the other, there is overlap between these techniques and the data that is produced. These features, both unique and shared, are important to consider for researchers interested in utilizing NGS technologies as they will ultimately determine both the cost and capabilities of a microbiome pipeline.
This article compares these two major sequencing techniques, taking key considerations into account such as resolution/depth and breadth of microbiome profiling and accessibility of each respective workflow, including cost and robustness. The purpose of this blog will be to inform scientists interested in pursuing these methods and to help readers further develop their understanding of the practical differences and similarities between Targeted 16S rRNA sequencing and Shotgun Metagenomic sequencing.
Sequencing Techniques: A Comparative Overview
1. Shotgun Metagenomic Sequencing
Shotgun metagenomic sequencing, or simply whole genome sequencing (WGS), is the indiscriminate sequencing of all genetic material present in a sample. This may include sequence from all domains of life (viruses, archaea, prokaryotes, and eukaryotes) as well as host genetic material if no selection is applied. With more advanced technologies, there are multiple methods to prepare a Shotgun library; however, the principle still stands. The input DNA will need to be sheared into appropriate fragment size, followed by adapter ligation. Barcode index sequence will be added onto adapter-ligated fragment via PCR reaction. The end-products are cleaned up and pooled into a single library pool which is now ready for sequencing.
A typical workflow for taxonomy analysis of shotgun metagenomic data includes trimming to remove technical sequence, such as adapter, and poor quality reads. Additional filtering may be carried out to remove host-origin, contaminant, and low-complexity reads . Finally, comparison to a reference database comprising whole genomes (e.g. Kraken2 and Centrifuge3) or selected marker genes (MetaPhlAn4 and mOTU5) to generate a taxonomic profile. This generated profile provides the compositional information necessary for cataloguing the entirety of organisms present in a given microbiome. This coverage is complete and includes the totality of genetic information present. This may provide additional analyses beyond taxonomic identification, including insight into functional pathways or identifying presence of antibiotic resistance genes.
2. Targeted 16S rRNA Gene Sequencing
Targeted 16S rRNA gene sequencing targets regions of specific and highly conserved sections of genes, named after the portion that is often targeted within the gene that encodes the small 30S subunit found in prokaryotes: the 16S rRNA gene (Figure. 1). The 16S rRNA gene is around 1500 base pairs long and contains nine hypervariable regions interspaced by conserved regions [1]. By synthesizing primers that target the conserved regions shared across phylogenetic groups, entire microbial communities can be characterized and genotyped by comparing the small differences within these hypervariable regions. The qPCR amplified regions are cleaned up, barcoded, pooled into a library, and sequenced. Since its inception, gene amplicon sequencing has been a major technique for resolving taxonomic profiles of complex microbiomes.
Microbiome Profiling: Applications and Considerations
One of the most widely sought applications shared between shotgun metagenomic sequencing and Targeted 16S sequencing is the characterization of microbiome communities. Given that microbes exist in nearly every environment, the variety of samples that can be sequenced is endless, thus the potential data to be generated equally so. Depending on the scope of scientific investigation, researchers may be interested in identifying what microbes are present in a sample, how diverse and abundant are those microbes, and what type of metabolic or other activity are they carrying out.
Targeted (16S) sequencing, by nature of its own mechanism of specificity, is limited to the scope of the targeted regions amplified in microbial identification. Researchers may choose to utilize this sequencing technique when looking for something specific within a sample, or preparing parallel libraries of samples targeting various regions (i.e. ITS region for fungal species and V3V4 for bacteria and archaea) if microbial discovery is desired.
Shotgun sequencing is a comprehensive technique that indexes all genetic information within a sample, complex or otherwise. In this regard, it is superior to targeted gene amplicon sequencing as it allows for identification of all organisms present rather than being limited to organisms containing the targeted region. Additionally, the complete genomic decoding of an organism intrinsically provides greater resolution during taxonomic identification since the entire range of the genome may be referenced to a database [2]. With shotgun sequencing it is possible to generate strain level resolution, something that targeted 16S sequencing is incapable of.
The genomic data generated by shotgun metagenomic sequencing may also provide insight in postulating metabolic activity of a community since all genes are sequenced. Without the ability to determine active expression, however, this only permits speculation rather than confirmation. Regardless, this is useful information, as it could establish grounds for further research utilizing metatanscriptomic analysis to determine the expressed metabolic activity of the community.
Accessibility and Practical Considerations
16S/ITS Sequencing | Shotgun Sequencing | Shallow Shotgun Sequencing | |
---|---|---|---|
Bacterial/Fungal Coverage | High | Limited | Limited |
Cross-Domain Coverage | No | Yes | Yes |
False Positive | Low Risk | High Risk | High Risk |
Taxonomy Resolution | Genus-Species | Species-Strains | Species-Strains |
Host DNA Interference | No | Yes | Yes |
Minimum DNA input | 10 copies of 16S | As low as 100fg | As low as 100fg |
Functional Profiling | No | Yes | Yes |
Resistome and Virulence Profiling | No | Yes | Yes |
Recommendation Sample Type | All | Human Microbiome | Human Microbiome |
Cost per sample | ~$60 | ~$145 | ~$125 |
1. Robustness of Workflow
Sample input is a critical variable to weigh when choosing a sequencing technique. The aim of a research project outlines the criteria of samples to be selected, and the samples selected may very well dictate the technique utilized. To evaluate the microbiome profile of low biomass samples and samples containing heavy host DNA presence, choosing WGS becomes a more costly option.
Historically, shotgun metagenomic sequencing offers complete genomic information but is slightly limited in its range of accessible samples. The shallow PCR amplification requires sufficient sample input concentration, needing at least 1 ng/µl of purified DNA. This limits the options of viable sample inputs, but not so drastically since a yield of >1 ng/µl is generally within reach of most sample types. With innovative improvements in sample processing, viable libraries can be generated with ultra-low input of as low as 100 femtogram.
The human microbiome is a heavily studied microbial community, but some sample types are often saturated with host DNA. This stands as another factor to consider. Pursuing shotgun sequencing of host-rich samples can quickly become costly as a much higher sequencing depths is required to compensate for a considerable portion of the reads toward host genome. This is regarded as an acceptable expense if one is interested in the data that only shotgun sequencing can provide such as data on viromes, metabolomes, the detection of antibiotic resistance genes, and other lower abundance microbial species. In studies only interested in defining microbiome communities, however, this cost and breadth of information is not necessary and targeted gene amplicon sequencing serves as an effective alternative.
Targeted 16S rRNA sequencing does not offer the extensive genetic information generated by shotgun metagenomic sequencing. Despite this, there are several advantageous qualities. The workflow for gene amplicon sequencing involves a high-cycle targeted amplification step that, in some protocols, reaches upwards of 40 rounds of replication. At these depths, it is possible for samples with concentrations as low as picograms per microliter to be successfully amplified and sequenced [3]. Coupled with the highly specific targeted 16S primers, gene amplicon sequencing is a sensible option for processing samples with low biomass and host-rich environments at, if only the microbial profile of a sample is sought.
2. Operational Demands
One of the primary factors that turns researchers away from the WGS technique are the extensive operational demands of running the service. The startup costs alone can turn away smaller research groups, with the latest Illumina sequencers often costing about a million dollars and reagents often costing over ten thousand dollars per run. For many, the experimental demand for sequencing does not justify the fiscal investment into such expensive systems.
Shotgun sequencing is capable of analyzing samples with incredible depth, with flow cells ranging from 1-25 billion paired-end reads. The consequence being that the output files for a NovaSeq frequently reach into the terabytes (TB), and quickly requires massive data infrastructure to transfer and house this information [4]. For some researchers, this level of information is a tremendous incentive in selecting their workflow for reasons mentioned earlier in the Microbiome Profiling section. But, for scientists only interested in knowing the microbial composition of their experimental sample, the cost to store and catalog this massive amount of data becomes excessive when there is such a cost-friendly alternative in the targeted 16S sequencing technique.
Targeted 16S rRNA sequencing produces high quality data within a narrower range compared to shotgun sequencing, and as a result is less expensive to perform. While the $128,000 price tag for an Illumina MiSeq is much more affordable, a similar concern persists. Research groups are not always capable of bearing the costs of establishing and maintaining the infrastructure for operating a sequencing system compared to outsourcing these techniques at a fraction of the cost.
Additional factors limit accessibility to NGS sequencing and play a major role in deciding a microbiome company’s long term developmental rollout plan. These factors include proper sample collection and storage, complete and unbiased DNA purification, competent library preparation, and a high-grade bioinformatics pipeline. In the case of all sequencing, be it shotgun or gene amplicon sequencing, a team of bioinformaticians is required to build and maintain a pipeline through which the sequencing data is polished and converted into interpretable and accurate information.
With all of this taken into consideration, fully developing a sequencing workflow becomes a daunting endeavor. These considerations further distance researchers from realistic and achievable fiscal goals. Microbiome groups in need of affordable sequencing may opt to outsource and offset the high startup and maintenance costs. Companies, like Zymo Research, offer a complete, high throughput sequencing service that includes bioinformatics analysis on a price-per-sample basis, all within a turnaround time of a few weeks. This greatly increases ease-of-access for scientists to gather high quality data on the microbial communities of their samples.
Conclusion
Selecting an appropriate microbiome methodology is critical to increasing the odds of success in microbiome studies. While the data provided by targeted (16S) sequencing is limited to microbial identification, often with resolution limited to genus or species, this method provides an excellent solution to initially characterize a microbial population and begin making inferences about the capabilities of that population. Additionally, the ease of preparing the libraries and its tolerance to low-input samples, samples contaminated with host genetic material, and ability to select for a single class of organism can allow targeted methods to work where shotgun (WGS) methods would likely either fail or prove extremely inefficient and costly.
For laboratories just beginning microbiome analysis or with limited headcount or equipment, selecting a microbiome service provider that is capable of running the full pipeline can enable them to reap the benefits of a more experienced and equipped team on a per-sample fee basis. Zymo Research is one such organization, with team members who have expertise in all relevant aspects of microbiome pipelines from sample collection and extraction to advanced microbiology, and even bioinformatic processing of results and analytic pipeline development.
As genomics technology continues to evolve and spread around the globe, we hope this article can serve to help inform researchers beginning to study the microbiome better identify their own needs and the best ways to meet them. As technology changes, this article will be updated to reflect any major shifts in short-read sequencing-based microbiome analysis.
Discover Zymo Research’s Shotgun Metagenomic Sequencing Service
Learn more about Zymo Research’s 16S/ITS Amplicon Sequencing Service
References
- Richa Bharti, Dominik G Grimm, Current challenges and best-practice protocols for microbiome analysis, Briefings in Bioinformatics, Volume 22, Issue 1, January 2021, Pages 178–193, https://doi.org/10.1093/bib/bbz155
- Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nature methods, 13(7), 581-583.
- Jakob Brandt, Albertsen Mads, Investigation of Detection Limits of DNA extraction and Primer Choice on the Observed Microbial Communities in Drinking Water Samples Using 16S rRNA Gene Amplicon Sequencing, Frontiers in Microbiology, Volume 9, 2018, https://www.frontiersin.org/articles/10.3389/fmicb.2018.02140
- Tanjo, T., Kawai, Y., Tokunaga, K. et al. Practical guide for managing large-scale human genome data in research. J Hum Genet 66, 39–52 (2021). https://doi.org/10.1038/s10038-020-00862-1