Clinical Applications for Next-Generation Sequencing (eBook)

Urszula Demkow, Rafal Ploski (Herausgeber)

eBook Download: PDF | EPUB

2015 | 1. Auflage
334 Seiten
Elsevier Science (Verlag)
978-0-12-801841-5 (ISBN)

Clinical Applications for Next Generation Sequencing provides readers with an outstanding postgraduate resource to learn about the translational use of NGS in clinical environments. Rooted in both medical genetics and clinical medicine, the book fills the gap between state-of-the-art technology and evidence-based practice, providing an educational opportunity for users to advance patient care by transferring NGS to the needs of real-world patients. The book builds an interface between genetic laboratory staff and clinical health workers to not only improve communication, but also strengthen cooperation. Users will find valuable tactics they can use to build a systematic framework for understanding the role of NGS testing in both common and rare diseases and conditions, from prenatal care, like chromosomal abnormalities, up to advanced age problems like dementia. - Fills the gap between state-of-the-art technology and evidence-based practice - Provides an educational opportunity which advances patient care through the transfer of NGS to real-world patient assessment - Promotes a practical tool that clinicians can apply directly to patient care - Includes a systematic framework for understanding the role of NGS testing in many common and rare diseases - Presents evidence regarding the important role of NGS in current diagnostic strategies

Clinical Applications for Next Generation Sequencing provides readers with an outstanding postgraduate resource to learn about the translational use of NGS in clinical environments. Rooted in both medical genetics and clinical medicine, the book fills the gap between state-of-the-art technology and evidence-based practice, providing an educational opportunity for users to advance patient care by transferring NGS to the needs of real-world patients. The book builds an interface between genetic laboratory staff and clinical health workers to not only improve communication, but also strengthen cooperation. Users will find valuable tactics they can use to build a systematic framework for understanding the role of NGS testing in both common and rare diseases and conditions, from prenatal care, like chromosomal abnormalities, up to advanced age problems like dementia. - Fills the gap between state-of-the-art technology and evidence-based practice- Provides an educational opportunity which advances patient care through the transfer of NGS to real-world patient assessment- Promotes a practical tool that clinicians can apply directly to patient care- Includes a systematic framework for understanding the role of NGS testing in many common and rare diseases- Presents evidence regarding the important role of NGS in current diagnostic strategies

Chapter 2

Basic Bioinformatic Analyses of NGS Data

Piotr Stawinski1, Ravi Sachidanandam2, Izabela Chojnicka3, and Rafał Płoski4 1Department of Immunology, Medical University of Warsaw, Warsaw, Poland 2Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, NY, USA 3Faculty of Psychology, University of Warsaw, Warsaw, Poland 4Department of Medical Genetics, Centre of Biostructure, Medical University of Warsaw, Warsaw, Poland

Abstract

The bottleneck in developing clinical applications of next generation sequencing is the storage and analysis of the large volumes of data that are generated. The applications are diverse but the common themes are computational and analytically challenging. We give here a broad overview of the various issues involved in handling such data, the concerns that need to be addressed at various steps of the processing of data, and presentation of results. We outline the principles and highlight tools and approaches, without being too specific, to give guidance to a clinician starting out in the field.

Keywords

Bioinformatics; Deep sequencing; NGS; Sequence analysis

Chapter Outline

Software Tools 20

Input Sequence Preprocessing 21

Mapping 23

Processing and Interpreting Mapping 26

Insertions and Deletions Realignment 26

Base Quality Recalibration 27

Variant Calling 27

Major Approaches to the Variant Calling 28

Variant Calling Format 28

Variant Annotation 30

Software and Hardware Issues 32

Computational Architecture 32

Storage Architecture 34

References 35

Since 2005, the technological progress in medical genetics, particularly next generation sequencing (NGS) technology, has revolutionized the use of genomics in clinical applications. NGS enables rapid (in a few days), inexpensive (a few thousand dollars) analyses of the whole genome. NGS has allowed the use of a variety of molecular biological techniques in diagnostics and patient care, such as i) whole-genome sequencing (WGS), which involves sequencing the whole genome to study mutations and rearrangements; ii) mRNA-seq to study changes in expression profiles; iii) small RNA-seq to study the role of microRNAs and other small noncoding RNAs; iv) methyl-seq to study DNA modifications such as methylation; v) chromatin immunoprecipitation (ChIP) to study chromatin modifications such as histone marks and to map protein–DNA interactions; vi) targeted sequencing to study select regions of the genome (mitochondria, cancer panels, etc.); and vii) noncoding transcript profiling. These techniques can be used to identify differences in disease states and identify biomarkers, as well as to help identify therapeutic targets and help clinicians decide on the course of treatment [1–3].

NGS experiments require the cooperation of experts from various fields including molecular biology, clinical work, technology, instrumentation, and bioinformatics. After collecting genetic material and clinical information about the subject, preparing that material, and performing consecutive steps of the NGS experiment, the NGS sequencer produces large volumes of data that are impossible to interpret without bioinformatic analysis.

The main goal of bioinformatic analyses is to identify differences in the disease state compared to the normal. In the cases of genetic disorders, the aim is to identify the functional differences between the reference and the subject genomes. Of the various differences that show up in any differential analyses, the aim is the identification of specific variants that either partially or fully explain the observed clinical phenotype. Such analyses are often not definitive, providing only a statistical association, owing to the influence of variety of factors on disease etiology.

Current NGS technologies are limited to sequencing small fragments of DNA (up to several hundred bases) [4]. To get around this limitation, the input material (DNA or RNA) is fragmented and then processed for sequencing. This results in several millions of fragments that then need to be reassembled to ascertain the source. This is a difficult problem to solve ab initio, but the existence of the reference genome helps with this process immensely. But this also means that large-scale structural changes, such as rearrangements of large sections of the genome, are difficult to detect using NGS (other complementary techniques such as fluorescence in situ hybridization, involving visualizing hybridization of tagged probes, are better suited to this).

Most technologies of sequencing work by synthesis of the second strand using a single strand of a DNA fragment as a template [5]. This process often introduces errors, which are overcome in NGS through the generation of a large number of reads (>20) covering each position. Reliable calls of variants, remains an open problem, requiring understanding of the nature of the errors introduced by each instrument and the sample preparation techniques.

In this chapter we will focus on the detection of single nucleotide variants (SNVs) and deletions/insertions (called indels) that are significantly smaller than the read size. There are other types of variants, including structural variants, such as inversions, tandem duplications, long indels, and translocations. Most of the techniques and algorithms presented in this chapter are focused on targeted sequencing experiments but may be applied to the whole genome as well.

We will describe in detail various components of bioinformatic analysis, beginning with the data input, reads mapping, and processing of the mapping products, up to variant calling, with special attention paid to issues with annotations. We will close with a discussion of hardware-related matters, including requirements for data analysis and secure storage, with a description of various architectures of computing systems for NGS data processing.

Software Tools

There are several software tools available to call SNVs and small indels from deep-sequencing data. Most tools have several analysis steps in common and create outputs in standard formats [6]. The common data flow and standard procedures are presented in Figure 1.

Figure 1 General NGS data processing pipeline [7].

Input Sequence Preprocessing

Sequencing instruments from Illumina generate data files using the FASTQ format, which uses four lines per read, with names, sequences, and quality scores (Figure 2). This has become the de facto standard for NGS data, owing to the ubiquity of Illumina instruments. The read quality at each base is a phred score that can range from 0 to 60 on a logarithmic scale, for example, a score of 30 represents a 1 in 1000 chance of error, while a score of 10 represents a 1 in 10 chance of error. To compress the data, the read quality per position is encoded using ASCII characters (so it uses 1 byte or 8 bits per position); a common encoding format is the Phred+33 quality score, which is presented in Figure 3 [8].

Several NGS sequencers can read sequences from both ends of a single DNA, creating “paired-end reads” presented in Figure 4. In such cases, the sequences are placed in paired FASTQ files; each row holds one end of the molecule with the corresponding paired end on the same row in the paired file. The FASTQ files are compressed using gzip and most programs will accept these compressed files, usually with a fastq.gz or fq.gz extension, as input.

FASTQ files are always preprocessed, to apply various quality controls and remove any adapters that remain from the sample preparation process (although with long inserts, adapter trimming is less of an issue). Preprocessing is critical to remove potential sources of error that can propagate through to variant calling [9]. The quality of bases is not evenly distributed: it decreases with increasing position in the read (Figure 5(a)). In addition, the first few bases can be of lower quality owing to the method by which the clusters are recognized in several sequencing platforms. The reads are usually trimmed, to remove low-quality reads, either using a fixed trimming from both ends or using a phred score-based trimming, using a score of 20 as the cutoff. Reads can also contain contamination, such as adapter sequences, when the inserts are smaller than the read length (Figure 4(c)). In such cases, adapter trimming, using alignment to the adapter, is used.

Figure 2 A read from a FASTQ file produced by Casava 1.8 software. In the FASTQ file each read sequence is described using four lines. 1. The sequence identifier and optional description: a) required @ symbol; b) the instrument name; c) run ID; d) flow cell ID; e) flow cell lane; f) tile number within the flow cell lane; g); h) the coordinate of the cluster within the tile; i) member of a pair; j) Y for reads filtered out, N for reads passing filter; k) 0 if the read is not identified as a control; l) index sequence. 2. The sequenced read: m) N denotes unidentified bases. 3. Line begins with +, optionally followed by a sequence identifier. 4. Encoded quality values.

Figure 3 Phred...

Erscheint lt. Verlag	10.9.2015
Sprache	englisch
Themenwelt	Medizin / Pharmazie ► Medizinische Fachgebiete ► Laboratoriumsmedizin
	Studium ► 2. Studienabschnitt (Klinik) ► Anamnese / Körperliche Untersuchung
	Studium ► 2. Studienabschnitt (Klinik) ► Humangenetik
	Naturwissenschaften ► Biologie ► Genetik / Molekularbiologie
ISBN-10	0-12-801841-0 / 0128018410
ISBN-13	978-0-12-801841-5 / 9780128018415

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 3,7 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 6,0 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 129,15