Computational and Statistical Methods for Protein Quantification by Mass Spectrometry
John Wiley & Sons Inc (Verlag)
978-1-119-96400-1 (ISBN)
The definitive introduction to data analysis in quantitative proteomics
This book provides all the necessary knowledge about mass spectrometry based proteomics methods and computational and statistical approaches to pursue the planning, design and analysis of quantitative proteomics experiments. The author’s carefully constructed approach allows readers to easily make the transition into the field of quantitative proteomics. Through detailed descriptions of wet-lab methods, computational approaches and statistical tools, this book covers the full scope of a quantitative experiment, allowing readers to acquire new knowledge as well as acting as a useful reference work for more advanced readers.
Computational and Statistical Methods for Protein Quantification by Mass Spectrometry:
Introduces the use of mass spectrometry in protein quantification and how the bioinformatics challenges in this field can be solved using statistical methods and various software programs.
Is illustrated by a large number of figures and examples as well as numerous exercises.
Provides both clear and rigorous descriptions of methods and approaches.
Is thoroughly indexed and cross-referenced, combining the strengths of a text book with the utility of a reference work.
Features detailed discussions of both wet-lab approaches and statistical and computational methods.
With clear and thorough descriptions of the various methods and approaches, this book is accessible to biologists, informaticians, and statisticians alike and is aimed at readers across the academic spectrum, from advanced undergraduate students to post doctorates entering the field.
Ingvar Eidhammer, Department of Informatics, University of Bergen, Norway Harald Barsnes, Department of Biomedicine, University of Bergen, Norway Geir Egil Eide, Centre for Clinical Research, Haukeland University,Norway Lennart Martens, Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Belgium
Preface xv
Terminology xvii
Acknowledgements xix
1 Introduction 1
1.1 The composition of an organism 1
1.1.1 A simple model of an organism 1
1.1.2 Composition of cells 3
1.2 Homeostasis, physiology, and pathology 4
1.3 Protein synthesis 4
1.4 Site, sample, state, and environment 4
1.5 Abundance and expression – protein and proteome profiles 5
1.5.1 The protein dynamic range 6
1.6 The importance of exact specification of sites and states 6
1.6.1 Biological features 7
1.6.2 Physiological and pathological features 7
1.6.3 Input features 7
1.6.4 External features 7
1.6.5 Activity features 7
1.6.6 The cell cycle 8
1.7 Relative and absolute quantification 8
1.7.1 Relative quantification 8
1.7.2 Absolute quantification 9
1.8 In vivo and in vitro experiments 9
1.9 Goals for quantitative protein experiments 10
1.10 Exercises 10
2 Correlations of mRNA and protein abundances 12
2.1 Investigating the correlation 12
2.2 Codon bias 14
2.3 Main results from experiments 15
2.4 The ideal case for mRNA-protein comparison 16
2.5 Exploring correlation across genes 17
2.6 Exploring correlation within one gene 18
2.7 Correlation across subsets 18
2.8 Comparing mRNA and protein abundances across genes from two situations 19
2.9 Exercises 20
2.10 Bibliographic notes 21
3 Protein level quantification 22
3.1 Two-dimensional gels 22
3.1.1 Comparing results from different experiments – DIGE 23
3.2 Protein arrays 23
3.2.1 Forward arrays 24
3.2.2 Reverse arrays 25
3.2.3 Detection of binding molecules 25
3.2.4 Analysis of protein array readouts 25
3.3 Western blotting 25
3.4 ELISA – Enzyme-Linked Immunosorbent Assay 26
3.5 Bibliographic notes 26
4 Mass spectrometry and protein identification 27
4.1 Mass spectrometry 27
4.1.1 Peptide mass fingerprinting (PMF) 28
4.1.2 MS/MS – tandem MS 29
4.1.3 Mass spectrometers 29
4.2 Isotope composition of peptides 32
4.2.1 Predicting the isotope intensity distribution 34
4.2.2 Estimating the charge 34
4.2.3 Revealing isotope patterns 34
4.3 Presenting the intensities – the spectra 36
4.4 Peak intensity calculation 38
4.5 Peptide identification by MS/MS spectra 38
4.5.1 Spectral comparison 41
4.5.2 Sequential comparison 41
4.5.3 Scoring 42
4.5.4 Statistical significance 42
4.6 The protein inference problem 42
4.6.1 Determining maximal explanatory sets 44
4.6.2 Determining minimal explanatory sets 44
4.7 False discovery rate for the identifications 44
4.7.1 Constructing the decoy database 45
4.7.2 Separate or composite search 46
4.8 Exercises 46
4.9 Bibliographic notes 47
5 Protein quantification by mass spectrometry 48
5.1 Situations, protein, and peptide variants 48
5.1.1 Situation 48
5.1.2 Protein variants – peptide variants 48
5.2 Replicates 49
5.3 Run – experiment – project 50
5.3.1 LC-MS/MS run 50
5.3.2 Quantification run 51
5.3.3 Quantification experiment 52
5.3.4 Quantification project 52
5.3.5 Planning quantification experiments 52
5.4 Comparing quantification approaches/methods 54
5.4.1 Accuracy 54
5.4.2 Precision 55
5.4.3 Repeatability and reproducibility 56
5.4.4 Dynamic range and linear dynamic range 56
5.4.5 Limit of blank – LOB 56
5.4.6 Limit of detection – LOD 57
5.4.7 Limit of quantification – LOQ 57
5.4.8 Sensitivity 57
5.4.9 Selectivity 57
5.5 Classification of approaches for quantification using LC-MS/MS 57
5.5.1 Discovery or targeted protein quantification 58
5.5.2 Label based vs. label free quantification 59
5.5.3 Abundance determination – ion current vs. peptide identification 60
5.5.4 Classification 60
5.6 The peptide (occurrence) space 60
5.7 Ion chromatograms 62
5.8 From peptides to protein abundances 62
5.8.1 Combined single abundance from single abundances 64
5.8.2 Relative abundance from single abundances 65
5.8.3 Combined relative abundance from relative abundances 66
5.9 Protein inference and protein abundance calculation 67
5.9.1 Use of the peptides in protein abundance calculation 67
5.9.2 Classifying the proteins 68
5.9.3 Can shared peptides be used for quantification? 68
5.10 Peptide tables 70
5.11 Assumptions for relative quantification 70
5.12 Analysis for differentially abundant proteins 71
5.13 Normalization of data 71
5.14 Exercises 72
5.15 Bibliographic notes 74
6 Statistical normalization 75
6.1 Some illustrative examples 75
6.2 Non-normally distributed populations 76
6.2.1 Skewed distributions 76
6.2.2 Measures of skewness 76
6.2.3 Steepness of the peak – kurtosis 77
6.3 Testing for normality 78
6.3.1 Normal probability plot 79
6.3.2 Some test statistics for normality testing 81
6.4 Outliers 82
6.4.1 Test statistics for the identification of a single outlier 83
6.4.2 Testing for more than one outlier 86
6.4.3 Robust statistics for mean and standard deviation 88
6.4.4 Outliers in regression 89
6.5 Variance inequality 90
6.6 Normalization and logarithmic transformation 90
6.6.1 The logarithmic function 90
6.6.2 Choosing the base 91
6.6.3 Logarithmic normalization of peptide/protein ratios 91
6.6.4 Pitfalls of logarithmic transformations 92
6.6.5 Variance stabilization by logarithmic transformation 92
6.6.6 Logarithmic scale for presentation 93
6.7 Exercises 94
6.8 Bibliographic notes 95
7 Experimental normalization 96
7.1 Sources of variation and level of normalization 96
7.2 Spectral normalization 98
7.2.1 Scale based normalization 99
7.2.2 Rank based normalization 101
7.2.3 Combining scale based and rank based normalization 101
7.2.4 Reproducibility of the normalization methods 102
7.3 Normalization at the peptide and protein level 103
7.4 Normalizing using sum, mean, and median 104
7.5 MA-plot for normalization 104
7.5.1 Global intensity normalization 105
7.5.2 Linear regression normalization 106
7.6 Local regression normalization – LOWESS 106
7.7 Quantile normalization 107
7.8 Overfitting 108
7.9 Exercises 109
7.10 Bibliographic notes 109
8 Statistical analysis 110
8.1 Use of replicates for statistical analysis 110
8.2 Using a set of proteins for statistical analysis 111
8.2.1 Z-variable 111
8.2.2 G-statistic 112
8.2.3 Fisher–Irwin exact test 115
8.3 Missing values 116
8.3.1 Reasons for missing values 116
8.3.2 Handling missing values 118
8.4 Prediction and hypothesis testing 118
8.4.1 Prediction errors 119
8.4.2 Hypothesis testing 120
8.5 Statistical significance for multiple testing 121
8.5.1 False positive rate control 122
8.5.2 False discovery rate control 123
8.6 Exercises 127
8.7 Bibliographic notes 128
9 Label based quantification 129
9.1 Labeling techniques for label based quantification 129
9.2 Label requirements 130
9.3 Labels and labeling properties 130
9.3.1 Quantification level 130
9.3.2 Label incorporation 131
9.3.3 Incorporation level 131
9.3.4 Number of compared samples 132
9.3.5 Common labels 132
9.4 Experimental requirements 132
9.5 Recognizing corresponding peptide variants 133
9.5.1 Recognizing peptide variants in MS spectra 133
9.5.2 Recognizing peptide variants in MS/MS spectra 134
9.6 Reference free vs. reference based 135
9.6.1 Reference free quantification 135
9.6.2 Reference based quantification 135
9.7 Labeling considerations 136
9.8 Exercises 136
9.9 Bibliographic notes 137
10 Reporter based MS/MS quantification 138
10.1 Isobaric labels 138
10.2 iTRAQ 140
10.2.1 Fragmentation 141
10.2.2 Reporter ion intensities 143
10.2.3 iTRAQ 8-plex 144
10.3 TMT – Tandem Mass Tag 145
10.4 Reporter based quantification runs 145
10.5 Identification and quantification 145
10.6 Peptide table 147
10.7 Reporter based quantification experiments 147
10.7.1 Normalization across LC-MS/MS runs – use of a reference sample 147
10.7.2 Normalizing within an LC-MS/MS run 149
10.7.3 From reporter intensities to protein abundances 149
10.7.4 Finding differentially abundant proteins 150
10.7.5 Distributing the replicates on the quantification runs 151
10.7.6 Protocols 152
10.8 Exercises 152
10.9 Bibliographic notes 153
11 Fragment based MS/MS quantification 155
11.1 The label masses 155
11.2 Identification 157
11.3 Peptide and protein quantification 158
11.4 Exercises 158
11.5 Bibliographic notes 159
12 Label based quantification by MS spectra 160
12.1 Different labeling techniques 160
12.1.1 Metabolic labeling – SILAC 160
12.1.2 Chemical labeling 162
12.1.3 Enzymatic labeling – 18O 165
12.2 Experimental setup 166
12.3 MaxQuant as a model 167
12.3.1 HL-pairs 167
12.3.2 Reliability of HL-pairs 169
12.3.3 Reliable protein results 169
12.4 The MaxQuant procedure 169
12.4.1 Recognize HL-pairs 169
12.4.2 Estimate HL-ratios 176
12.4.3 Identify HL-pairs by database search 177
12.4.4 Infer protein data 181
12.5 Exercises 183
12.6 Bibliographic notes 184
13 Label free quantification by MS spectra 185
13.1 An ideal case – two protein samples 185
13.2 The real world 186
13.2.1 Multiple samples 187
13.3 Experimental setup 187
13.4 Forms 187
13.5 The quantification process 188
13.6 Form detection 189
13.7 Pair-wise retention time correction 191
13.7.1 Determining potentially corresponding forms 191
13.7.2 Linear corrections 192
13.7.3 Nonlinear corrections 192
13.8 Approaches for form tuple detection 193
13.9 Pair-wise alignment 193
13.9.1 Distance between forms 194
13.9.2 Finding an optimal alignment 195
13.10 Using a reference run for alignment 196
13.11 Complete pair-wise alignment 197
13.12 Hierarchical progressive alignment 197
13.12.1 Measuring the similarity or the distance of two runs 198
13.12.2 Constructing static guide trees 198
13.12.3 Constructing dynamic guide trees 199
13.12.4 Aligning subalignments 199
13.12.5 SuperHirn 199
13.13 Simultaneous iterative alignment 200
13.13.1 Constructing the initial alignment in XCMS 200
13.13.2 Changing the initial alignment 201
13.14 The end result and further analysis 202
13.15 Exercises 202
13.16 Bibliographic notes 204
14 Label free quantification by MS/MS spectra 205
14.1 Abundance measurements 205
14.2 Normalization 207
14.3 Proposed methods 207
14.4 Methods for single abundance calculation 207
14.4.1 emPAI 208
14.4.2 PMSS 208
14.4.3 NSAF 209
14.4.4 SI 209
14.5 Methods for relative abundance calculation 210
14.5.1 PASC 210
14.5.2 RIBAR 210
14.5.3 xRIBAR 211
14.6 Comparing methods 212
14.6.1 An analysis by Griffin 212
14.6.2 An analysis by Colaert 213
14.7 Improving the reliability of spectral count quantification 213
14.8 Handling shared peptides 214
14.9 Statistical analysis 215
14.10 Exercises 215
14.11 Bibliographic notes 216
15 Targeted quantification – Selected Reaction Monitoring 218
15.1 Selected Reaction Monitoring – the concept 218
15.2 A suitable instrument 219
15.3 The LC-MS/MS run 220
15.3.1 Sensitivity and accuracy 222
15.4 Label free and label based quantification 224
15.4.1 Label free SRM based quantification 224
15.4.2 Label based SRM based quantification 225
15.5 Requirements for SRM transitions 227
15.5.1 Requirements for the peptides 227
15.5.2 Requirements for the fragment ions 228
15.6 Finding optimal transitions 229
15.7 Validating transitions 230
15.7.1 Testing linearity 230
15.7.2 Determining retention time 231
15.7.3 Limit of detection/quantification 231
15.7.4 Dealing with low abundant proteins 231
15.7.5 Checking for interference 232
15.8 Assay development 232
15.9 Exercises 233
15.10 Bibliographic notes 234
16 Absolute quantification 235
16.1 Performing absolute quantification 235
16.1.1 Linear dependency between the calculated and the real abundances 236
16.2 Label based absolute quantification 236
16.2.1 Stable isotope-labeled peptide standards 237
16.2.2 Stable isotope-labeled concatenated peptide standards 238
16.2.3 Stable isotope-labeled intact protein standards 239
16.3 Label free absolute quantification 239
16.3.1 Quantification by MS spectra 239
16.3.2 Quantification by the number of MS/MS spectra 241
16.4 Exercises 242
16.5 Bibliographic notes 242
17 Quantification of post-translational modifications 244
17.1 PTM and mass spectrometry 244
17.2 Modification degree 245
17.3 Absolute modification degree 246
17.3.1 Reversing the modification 246
17.3.2 Use of two standards 248
17.3.3 Label free modification degree analysis 249
17.4 Relative modification degree 250
17.5 Discovery based modification stoichiometry 251
17.5.1 Separate LC-MS/MS experiments for modified and unmodified peptides 251
17.5.2 Common LC-MS/MS experiment for modified and unmodified peptides 252
17.5.3 Reliable results and significant differences 252
17.6 Exercises 253
17.7 Bibliographic notes 253
18 Biomarkers 254
18.1 Evaluation of potential biomarkers 254
18.1.1 Taking disease prevalence into account 255
18.2 Evaluating threshold values for biomarkers 257
18.3 Exercises 258
18.4 Bibliographic notes 258
19 Standards and databases 259
19.1 Standard data formats for (quantitative) proteomics 259
19.1.1 Controlled vocabularies (CVs) 260
19.1.2 Benefits of using CV terms to annotate metadata 260
19.1.3 A standard for quantitative proteomics data 261
19.1.4 HUPO PSI 262
19.2 Databases for proteomics data 262
19.3 Bibliographic notes 263
20 Appendix A: Statistics 264
20.1 Samples, populations, and statistics 264
20.2 Population parameter estimation 265
20.2.1 Estimating the mean of a population 266
20.3 Hypothesis testing 267
20.3.1 Two types of errors 268
20.4 Performing the test – test statistics and p-values 268
20.4.1 Parametric test statistics 269
20.4.2 Nonparametric test statistics 269
20.4.3 Confidence intervals and hypothesis testing 270
20.5 Comparing means of populations 271
20.5.1 Analyzing the mean of a single population 271
20.5.2 Comparing the means from two populations 272
20.5.3 Comparing means of paired populations 275
20.5.4 Multiple populations 275
20.5.5 Multiple testing 276
20.6 Comparing variances 276
20.6.1 Testing the variance of a single population 276
20.6.2 Testing the variances of two populations 277
20.7 Percentiles and quantiles 278
20.7.1 A straightforward method for estimating the percentiles 279
20.7.2 Quantiles 279
20.7.3 Box plots 280
20.8 Correlation 280
20.8.1 Pearson’s product-moment correlation coefficient 283
20.8.2 Spearman’s rank correlation coefficient 285
20.8.3 Correlation line 286
20.9 Regression analysis 287
20.9.1 Regression line 288
20.9.2 Relation between Pearson’s correlation coefficient and the regression parameters 289
20.10 Types of values and variables 290
21 Appendix B: Clustering and discriminant analysis 292
21.1 Clustering 292
21.1.1 Distances and similarities 293
21.1.2 Distance measures 294
21.1.3 Similarity measures 295
21.1.4 Distances between an object and a class 295
21.1.5 Distances between two classes 296
21.1.6 Missing data 297
21.1.7 Clustering approaches 297
21.1.8 Sequential clustering 298
21.1.9 Hierarchical clustering 300
21.2 Discriminant analysis 303
21.2.1 Step-wise feature selection 304
21.2.2 Linear discriminant analysis using original features 307
21.2.3 Canonical discriminant analysis 309
21.3 Bibliographic notes 312
Bibliography 313
Index 327
Verlagsort | New York |
---|---|
Sprache | englisch |
Maße | 160 x 239 mm |
Gewicht | 576 g |
Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Naturwissenschaften ► Biologie ► Biochemie | |
Naturwissenschaften ► Chemie ► Analytische Chemie | |
ISBN-10 | 1-119-96400-8 / 1119964008 |
ISBN-13 | 978-1-119-96400-1 / 9781119964001 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich