Modern Genome Annotation (eBook)
XVIII, 490 Seiten
Springer Wien (Verlag)
978-3-211-75123-7 (ISBN)
An accurate description of current scientific developments in the field of bioinformatics and computational implementation is presented by research of the BioSapiens Network of Excellence. Bioinformatics is essential for annotating the structure and function of genes, proteins and the analysis of complete genomes and to molecular biology and biochemistry.
Included is an overview of bioinformatics, the full spectrum of genome annotation approaches including; genome analysis and gene prediction, gene regulation analysis and expression, genome variation and QTL analysis, large scale protein annotation of function and structure, annotation and prediction of protein interactions, and the organization and annotation of molecular networks and biochemical pathways. Also covered is a technical framework to organize and represent genome data using the DAS technology and work in the annotation of two large genomic sets: HIV/HCV viral genomes and splicing alternatives potentially encoded in 1% of the human genome.
CONTENTS 6
INTRODUCTION BIOSAPIENS: A European Network of Excellence to develop genome annotation resources 18
SECTION 1 Gene defintion 21
CHAPTER 1.1 State of the art in eukaryotic gene prediction 22
1 Introduction 22
2 Classes of information 25
3 Frameworks for integration of information 32
4 Training 41
5 Evaluation of gene prediction methods 42
6 Discussion 47
CHAPTER 1.2 Quality control of gene predictions 55
1 Introduction 55
2 Quality control of gene predictions 56
3 Results 58
4 Alternative interpretations of the results of MisPred analyses 64
5 Conclusions 66
SECTION 2 Gene regulation and expression 67
CHAPTER 2.1 Evaluating the prediction of cis-acting regulatory elements in genome sequences 68
1 Introduction 68
2 Transcription factor binding sites and motifs 71
3 Scanning a sequence with a position-specific scoring matrix 72
4 Evaluating pattern matching results 77
5 Discovering motifs in promoter sequences 82
6 Methodological issues for evaluating pattern discovery 96
7 Good practices for evaluating predictive tools 97
8 What has not been covered in this chapter 99
9 Materials 100
Abbreviations 100
CHAPTER 2.2 A biophysical approach to large- scale protein- DNA binding data 103
1 Binding site predictions 104
2 Affinity model {XE “affinity model, TRAP”} 107
3 Affinity statistics {XE “affinity statistics”} 111
4 Applications 113
5 Summary 114
CHAPTER 2.3 From gene expression profiling to gene regulation 116
1 Introduction 116
2 Generating sets of co-expressed genes 117
3 Finding putative regulatory regions using comparative genomics 120
4 Detecting common transcription factors for co- expressed gene sets 122
5 Combining transcription factor information 125
6 “De novo” prediction of transcription factor binding motifs 126
SECTION 3 Annotation and genetics 131
CHAPTER 3 Annotation, genetics and transcriptomics 132
1 Introduction 132
2 Genetics and gene function 134
3 Use of animal models 137
4 Transcriptomics: gene expression microarrays 139
5 Gene annotation 141
SECTION 4 Functional annotation of proteins 146
CHAPTER 4.1 Resources for functional annotation 147
1 Introduction 147
2 Resources for functional annotation – protein sequence databases 148
3 UniProt – The Universal Protein Resource 149
4 The UniProt Knowledgebase (UniProtKB) 150
5 Protein family classification for functional annotation 160
6 From genes and proteins to genomes and proteomes 168
7 Summary 169
CHAPTER 4.2 Annotating bacterial genomes 173
1 Background 173
2 Global sequence properties 178
3 Identifying genomic objects 180
4 Functional annotation 182
5 A recursive view of genome annotation 184
6 Improving annotation: parallel analysis and comparison of multiple bacterial genomes 186
7 Perspectives: new developments for the construction of genome databases, metagenome analyses and user- friendly platforms 188
8 Annex: databases and platforms for annotating bacterial genomes 190
CHAPTER 4.3 Data mining in genome annotation 199
1 Introduction 199
2 An overview of large biological databases 201
3 Data mining in genome annotation 208
4 Applying association rule mining to the Swiss-Prot database 213
5 Applying association rule mining to the PEDANT database 215
6 Conclusion 218
CHAPTER 4.4 Modern genome annotation: the BioSapiens network 221
1 Homologous and non-homologous sequence methods for assigning protein functions 221
CHAPTER 4.5 Structure to function 247
1 Introduction to protein structure and function 247
2 FireDB and firestar – the prediction of functionally important residues 249
3 Modelling local function conservation in sequence and structure space for predicting molecular function 254
4 Structural templates for functional characterization 257
5 An integrated pipeline for functional prediction 260
CHAPTER 4.6 Harvesting the information from a family of proteins 271
1 Introduction 271
2 Molecular class-specific information systems 273
3 Extracting information from sequences 275
4 Correlation studies on GPCRs 277
5 Discussion 282
SECTION 5 Protein structure prediction 288
CHAPTER 5.1 Structure prediction of globular proteins 289
1 The folding problem 289
2 The evolution of protein structures and its implications for protein structure prediction 292
3 Template based modelling 293
4 Template-free protein structure prediction 299
5 Automated structure prediction 306
6 Conclusions and future outlook 310
CHAPTER 5.2 The state of the art ofmembrane protein structure prediction: from sequence to 3D structure 314
1 Why membrane proteins? 314
2 Many functions 316
3 Bioinformatics and membrane proteins: is it feasible to predict the 3D structure of a membrane protein? 316
4 Predicting the topology of membrane proteins 317
5 How many methods to predict membrane protein topology? 319
6 Benchmarking the predictors of transmembrane topology 321
7 How many membrane proteins in the Human genome? 324
8 Membrane proteins and genetic diseases: PhD-SNP at work 325
9 Last but not least: 3D MODELLING of membrane proteins 327
10 What can currently be done in practice? 328
11 Can we improve? 329
SECTION 6 Protein– protein complexes, pathways and networks 332
CHAPTER 6.1 Computational analysis of metabolic networks 333
1 Introduction 333
2 Computational ressources on metabolism 335
3 Basic notions of graph theory 339
4 Topological analysis of metabolic networks 340
5 Assessing reconstructed metabolic networks against physiological data 346
Conclusion 352
CHAPTER 6.2 Protein– protein interactions: analysis and prediction 356
1 Introduction 356
2 Experimental methods 357
3 Protein interaction databases 359
4 Data standards for molecular interactions 359
5 The IntAct molecular interaction database 363
6 Interaction networks 365
7 Visualization software for molecular networks 368
8 Estimates of the number of protein interactions 374
9 Multi-protein complexes 375
10 Network modules 376
11 Diseases and protein interaction networks 379
12 Sequence-based prediction of protein interactions 383
13 Integration of experimentally determined and predicted interactions 388
14 Domain–domain interactions 392
15 Biomolecular docking 398
SECTION 7 Infrastructure for distributed protein annotation 414
CHAPTER 7 Infrastructure for distributed protein annotation 415
1 Introduction 415
2 The Distributed Annotation System (DAS) 417
3 DAS infrastructure 417
4 The protein feature ontology 424
5 Conclusion 427
SECTION 8 Applications 429
CHAPTER 8.1 Viral bioinformatics 430
1 Introduction 430
2 Viral evolution in the human population 431
3 Interaction between the virus and the human immune system 435
4 Viral evolution in the human host 443
5 Perspectives 451
CHAPTER 8.2 Alternative splicing in the ENCODE protein complement 454
1 Introduction 454
2 Prediction of variant location 456
3 Prediction of variant function – analysis of the role of alternative splicing in changing function by modulation of functional residues 459
4 Prediction of variant structure 464
5 Summary of effects of alternative splicing 468
6 Prediction of principal isoforms 473
7 The ENCODE pipeline – an automated workflow for analysis of human splice isoforms 478
CONTRIBUTORS 486
CHAPTER 3 Annotation, genetics and transcriptomics (p. 123-124)
R. Mott
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
1 Introduction
This chapter discusses how to combine genome annotations of the type described elsewhere in this book with genetic and functional genomics data to find the genes associatedwith a phenotype, and in particular with a complex disease. This problemis of fundamental importance, the promise that understanding the molecular basis of common diseases would lead to effective treatments helped motivate and fund the human genome project.
Complex diseases such as cancer, diabetes, cardiovascular disease and depression are defined as conditions with multiple causes, both genetic (due to mutations in the genome) and environmental (everything else). By contrast, a Mendelian disease is caused by mutations in a single gene, with minimal environmental contribution. With a few exceptions such as cystic fibrosis in Caucasians and sickle-cell anaemia in parts of equatorial Africa, most Mendelian diseases are rare and do not impose a major health care burden on society. Most common diseases are complex, the exceptions being caused by infectious agents such as HIV and tuberculosis, and even in these cases there is a genetic contribution to resistance to infection.
In general, most complex diseases have a significant genetic component which we can estimate by examining the co-prevalence of a disease in genetically identical (monozygotic) twins compared to non-identical (dizygotic) twins, who only share 50% of their DNA by descent. Because the average effect due to shared environment should be the same in the two groups, any excess in co-prevalence is likely to be genetic. Thus it is possible to estimate the extent of the genetic contribution to a disease without identifying the causative genes and polymorphisms (Mather and Jinks 1982).
The ultimate aim of gene annotation is to describe the function of every segment of the genome, including protein coding genes as well as micro-RNAs, transcription-factor binding sites and other cryptic functional elements. In addition we want to annotate the functional consequence of every polymorphism observed in a population. If we had a perfectly annotated genome then we could predict which genes are relevant to each disease, and there would be no need for further work. However, in fact we have only begun to scratch the surface of the annotation problem, and we will need to be able to integrate data from multiple sources in order to make progress.
Before going further it is important to clarify what is meant by the phrase “gene function”. This turns out to be a surprisingly difficult concept, depending on the context in which the question is being asked. Gene function may be defined at a number of levels. For example, for protein-coding genes, it is important to know in which tissues and at which developmental stages the protein is expressed, and in which splice variants or isoforms. Next, the interactants of the protein are important, as they define the pathways in which the protein functions. Finally we wish to understand the consequences of perturbations to the gene`s DNA sequence, as these may give rise to genetic disease.
Erscheint lt. Verlag | 2.10.2009 |
---|---|
Zusatzinfo | XVIII, 490 p. 135 illus., 110 illus. in color. |
Verlagsort | Vienna |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik |
Naturwissenschaften ► Biologie | |
Naturwissenschaften ► Chemie | |
Technik | |
Schlagworte | Annotation • biochemistry • Bioinformatics • Biology • Biosapiens • Data Mining • Development • ENCODE • genes • Genetics • Genome • Molecular Biology • Protein • Protein complexes • Protein Structure • Protein Structure Prediction • Proteomics |
ISBN-10 | 3-211-75123-8 / 3211751238 |
ISBN-13 | 978-3-211-75123-7 / 9783211751237 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,6 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich