Metagenomics for Microbiology (eBook)

Jacques Izard, Maria Rivera (Herausgeber)

eBook Download: PDF | EPUB

2014 | 1. Auflage
188 Seiten
Elsevier Science (Verlag)
978-0-12-410508-9 (ISBN)

Concisely discussing the application of high throughput analysis to move forward our understanding of microbial principles, Metagenomics for Microbiology provides a solid base for the design and analysis of omics studies for the characterization of microbial consortia. The intended audience includes clinical and environmental microbiologists, molecular biologists, infectious disease experts, statisticians, biostatisticians, and public health scientists. This book focuses on the technological underpinnings of metagenomic approaches and their conceptual and practical applications. With the next-generation genomic sequencing revolution increasingly permitting researchers to decipher the coding information of the microbes living with us, we now have a unique capacity to compare multiple sites within individuals and at higher resolution and greater throughput than hitherto possible. The recent articulation of this paradigm points to unique possibilities for investigation of our dynamic relationship with these cellular communities, and excitingly the probing of their therapeutic potential in disease prevention or treatment of the future. - Expertly describes the latest metagenomic methodologies and best-practices, from sample collection to data analysis for taxonomic, whole shotgun metagenomic, and metatranscriptomic studies - Includes clear-headed pointers and quick starts to direct research efforts and increase study efficacy, eschewing ponderous prose - Presented topics include sample collection and preparation, data generation and quality control, third generation sequencing, advances in computational analyses of shotgun metagenomic sequence data, taxonomic profiling of shotgun data, hypothesis testing, and mathematical and computational analysis of longitudinal data and time series. Past-examples and prospects are provided to contextualize the applications.

Concisely discussing the application of high throughput analysis to move forward our understanding of microbial principles, Metagenomics for Microbiology provides a solid base for the design and analysis of omics studies for the characterization of microbial consortia. The intended audience includes clinical and environmental microbiologists, molecular biologists, infectious disease experts, statisticians, biostatisticians, and public health scientists. This book focuses on the technological underpinnings of metagenomic approaches and their conceptual and practical applications. With the next-generation genomic sequencing revolution increasingly permitting researchers to decipher the coding information of the microbes living with us, we now have a unique capacity to compare multiple sites within individuals and at higher resolution and greater throughput than hitherto possible. The recent articulation of this paradigm points to unique possibilities for investigation of our dynamic relationship with these cellular communities, and excitingly the probing of their therapeutic potential in disease prevention or treatment of the future. - Expertly describes the latest metagenomic methodologies and best-practices, from sample collection to data analysis for taxonomic, whole shotgun metagenomic, and metatranscriptomic studies- Includes clear-headed pointers and quick starts to direct research efforts and increase study efficacy, eschewing ponderous prose- Presented topics include sample collection and preparation, data generation and quality control, third generation sequencing, advances in computational analyses of shotgun metagenomic sequence data, taxonomic profiling of shotgun data, hypothesis testing, and mathematical and computational analysis of longitudinal data and time series. Past-examples and prospects are provided to contextualize the applications.

Chapter 2

Long-Read, Single Molecule, Real-Time (SMRT) DNA Sequencing for Metagenomic Applications

Brett Bowman

Mincheol Kim

Yong-Joon Cho

Jonas Korlach

Abstract

In this chapter, we describe applications of single molecule, real-time (SMRT) DNA sequencing toward metagenomic research. The long sequence reads, combined with a lack of bias with respect to DNA sequence context or GC content, facilitate a more comprehensive analysis of the genomic constitution of microbial communities. Full-length 16S RNA gene sequencing at high (>99%) accuracy allows for species-level characterization of community members concomitant with the determination of community structure. The application of SMRT sequencing to whole-community shotgun microbial metagenomics has also been discussed.

Keywords

Real-time DNA sequencing

microbiome composition

soil microbiota

water microbiome

long read length

Elucidating the Earth’s microecology remains one of the foremost challenges in biology, with profound implications for human health, agriculture, chemistry, energy, and other areas. We have thus far only captured a very small fraction of the Earth’s microbial diversity, with estimates of the number of bacterial and archaeal “species” reaching into the millions.1 However, our understanding of microbial communities has been dramatically improving through the use of high-throughput DNA sequencing technologies.

The sequencing of ribosomal RNA (rRNA) genes, in particular, the small subunits (SSUs), have been widely used for over 30 years for studying microbial community structure, despite limitations imposed by DNA sequencing technologies.2 For years, the only method available was to painstakingly clone each individual gene of interest, tile over it with multiple Sanger sequencing reactions, and manually stitch the results together.3 As recently as 2008, Sanger sequencing was still the most common approach, as contemporary next-generation sequencers with read lengths of 100 base pairs or less were unable to significantly differentiate taxa.4

This changed rapidly starting around 2009 with the introduction of the titanium sequencing chemistry for 454 pyrosequencing, providing read lengths of greater than 300 bases for hundreds of thousands of reads at a time.5 Simultaneously, the development of specialized software tools such as Mothur,6 the RDP classifier,7 and QIIME8 allowed the analysis of such large datasets. More recently, this trend has continued with the adoption of approximately 200 base pair assembled paired-end Illumina reads for some metagenomics applications,9 allowing for sequencing millions of reads in a single experiment, albeit at the cost of reduced read lengths compared with other sequencing technologies. The adoption of next-generation sequencing for metagenomics thus led to an exponential increase in the amount of data that could be generated from uncultured samples, providing the foundational method for projects such as Metagenomics and Microbial Ecology10 and the Human Microbiome Project.11

However, efforts to obtain clear pictures of metagenomes in this fashion have been complicated by the short read lengths that limit the resolving power of rDNA sequences, as well as inherent biases from both the polymerase chain reaction (PCR) and the next-generation sequencing technologies.12 The largest source of bias in community 16S sequencing is caused by the initial PCR step.13 Careful primer selection is important both because different variable regions of the 16S gene show differing capacities to differentiate taxa14 and no primer sites in the gene are perfectly conserved across all phyla.15 Read lengths between 500 and 700 bp are sufficient to differentiate most phyla,16,17 but which regions are required vary, and no region has the resolution of the full-length gene.

In addition, biases inherent in the next-generation sequencing technologies can affect the data interpretation.18 For 454 sequencing, this led to the development of PyroNoise that attempts to reduce the effect of context-specific error on the analysis of amplicons.19 To our knowledge, no similar tools have been developed for Illumina-based sequence data, despite the platform also having known context-specific errors.20 GC-content bias can also affect the quality of the second-generation sequencing data,21 and this effect has been directly studied on the 454 platform for 16S sequencing.22,23

Here, we describe the application of long sequence reads provided by Pacific Biosciences’ single molecule, real-time (SMRT) DNA sequencing to decode the entire 16S rRNA gene. SMRT sequencing is based on monitoring the activity of individual DNA polymerase molecules and detecting its activity of successive nucleotide incorporations in real time.24,25. Compared with other sequencing methods, it exhibits much longer read lengths (8500 bp on average with the latest (P5-C3) sequencing chemistry), the least sequence context bias,26 and a high consensus accuracy due to the random nature of sequencing errors.27 By exploiting these characteristics and moving from shorter amplicons to sequencing the full-length gene significantly reduces the primer bias in 16S community profiling. The selection of specific variable regions, important when read length is a limiting factor, is no longer required as the entire sequence is obtained in a single read. Although biases inherent in primer designs are unavoidable, the sites flanking the terminal V1 and V9 regions are among the most conserved: primers targeting those sites, commonly referred to as either 27F/1492R or GM3/GM4, are among the most extensively used and optimized 16S primers,28 capturing approximately 87% of known sequences with less than two mismatches.15 In addition, following the initial PCR, there are no additional amplification steps during library preparation and sequencing, avoiding any further amplification bias. SMRT sequencing has been shown to display very little bias with respect to GC content and sequence context,29 resulting in higher sequence quality across the entire 16S rRNA gene and reduced bias in community structure.

Full-length 16S rRNA gene sequencing

To demonstrate the application of SMRT sequencing to surveying metagenomic amplicons, we sequenced a metagenomic mock community consisting of an equimolar mixture of 20 known, full-length 16S rRNA gene sequences from 12 distinct bacterial lineages. We analyzed PCR-amplified, full-length 16S rRNA genes using 27F/1492R primers and prepared sequencing libraries from the amplicons according to the standard library preparation protocol.30 Sequencing was performed in triplicate by running three barcoded technical PCR replicates on each SMRT Cell. To generate high-quality, full-length 16S sequence reads, we employed circular consensus sequencing (CCS),30 which allows for the repeated sequencing of the same DNA molecule to generate a high-quality intramolecular consensus (Figure 2.1A). The median read length for sequenced molecules was 5560 bp or approximately 3.5 passes over the ∼1500 bp template sequence. Each SMRT Cell produced 31,000–43,000 raw sequence reads, of which 17,000–24,000 reads contained sufficient coverage of the template to generate CCS sequences. It is worth noting that the samples were somewhat underloaded suggesting that even greater throughput could be achieved upon optimizing loading conditions. Comparison of the predicted CCS read accuracy with the known reference sequences showed excellent concordance, as calculated from the per-base phred quality scores (Figure 2.1B), with a median predicted accuracy of 99.7% over all reads (Figure 2.1C).

Fig. 2.1 SMRT Sequencing of full-length 16S RNA generated from a mock community of 20 known sequences. (A) Schematic of generating high-accuracy 16S reads through circular consensus sequencing (CCS). (B) Concordance of predicted CCS accuracy versus observed accuracy against the mock community reference. (C) Histogram of predicted concordance with the reference for full-length 16S CCS sequences.

The sequences were analyzed with a combination of standard tools available in Mothur6 and custom python scripts to accommodate the unique needs of single-molecule sequencing data, collectively available for public use on Github as rDnaTools.31 Sequences from different replicates were demultiplexed if at least one barcode sequence could be identified with HMMER,32 which recovered 99.5% of all CCS sequences. Truncated sequences under 500 bp and concatenated products over 2000 bp were discarded. De-multiplexed sequences were then aligned to the SILVA reference alignment of bacterial ribosomal SSU sequences. Despite a range of sequence lengths (1483 ± 169 bp), 98.5% of all de-multiplexed sequences covered the entire canonical alignment (Figure 2.2). The differences in lengths are because of the biological variation of 16S RNA gene lengths, mainly caused by more variability in the loop regions.

Fig. 2.2 Sequence lengths of 16S rRNA gene SMRT sequencing CCS reads from a mock community of 20 known sequences. (A) Sequence lengths after barcode trimming and chimera filtering. (B)...

Erscheint lt. Verlag	7.11.2014
Sprache	englisch
Themenwelt	Medizin / Pharmazie ► Gesundheitsfachberufe
	Medizin / Pharmazie ► Medizinische Fachgebiete ► Mikrobiologie / Infektologie / Reisemedizin
	Naturwissenschaften ► Biologie ► Genetik / Molekularbiologie
	Naturwissenschaften ► Biologie ► Mikrobiologie / Immunologie
	Technik
ISBN-10	0-12-410508-4 / 0124105084
ISBN-13	978-0-12-410508-9 / 9780124105089

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 8,7 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 4,2 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.