Computational and Statistical Methods for Protein Quantification by Mass Spectrometry

Ingvar Eidhammer, Harald Barsnes, Geir Egil Eide, Lennart Martens (Autoren)

Buch | Hardcover

360 Seiten

2013
John Wiley & Sons Inc (Verlag)
978-1-119-96400-1 (ISBN)

Artikel merken

The definitive introduction to data analysis in quantitative proteomics

This book provides all the necessary knowledge about mass spectrometry based proteomics methods and computational and statistical approaches to pursue the planning, design and analysis of quantitative proteomics experiments. The author’s carefully constructed approach allows readers to easily make the transition into the field of quantitative proteomics. Through detailed descriptions of wet-lab methods, computational approaches and statistical tools, this book covers the full scope of a quantitative experiment, allowing readers to acquire new knowledge as well as acting as a useful reference work for more advanced readers.

Computational and Statistical Methods for Protein Quantification by Mass Spectrometry:

Introduces the use of mass spectrometry in protein quantification and how the bioinformatics challenges in this field can be solved using statistical methods and various software programs.
Is illustrated by a large number of figures and examples as well as numerous exercises.
Provides both clear and rigorous descriptions of methods and approaches.
Is thoroughly indexed and cross-referenced, combining the strengths of a text book with the utility of a reference work.
Features detailed discussions of both wet-lab approaches and statistical and computational methods.

With clear and thorough descriptions of the various methods and approaches, this book is accessible to biologists, informaticians, and statisticians alike and is aimed at readers across the academic spectrum, from advanced undergraduate students to post doctorates entering the field.

Ingvar Eidhammer, Department of Informatics, University of Bergen, Norway Harald Barsnes, Department of Biomedicine, University of Bergen, Norway Geir Egil Eide, Centre for Clinical Research, Haukeland University,Norway Lennart Martens, Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Belgium

Preface xv

Terminology xvii

Acknowledgements xix

1 Introduction 1

1.1 The composition of an organism 1

1.1.1 A simple model of an organism 1

1.1.2 Composition of cells 3

1.2 Homeostasis, physiology, and pathology 4

1.3 Protein synthesis 4

1.4 Site, sample, state, and environment 4

1.5 Abundance and expression – protein and proteome profiles 5

1.5.1 The protein dynamic range 6

1.6 The importance of exact specification of sites and states 6

1.6.1 Biological features 7

1.6.2 Physiological and pathological features 7

1.6.3 Input features 7

1.6.4 External features 7

1.6.5 Activity features 7

1.6.6 The cell cycle 8

1.7 Relative and absolute quantification 8

1.7.1 Relative quantification 8

1.7.2 Absolute quantification 9

1.8 In vivo and in vitro experiments 9

1.9 Goals for quantitative protein experiments 10

1.10 Exercises 10

2 Correlations of mRNA and protein abundances 12

2.1 Investigating the correlation 12

2.2 Codon bias 14

2.3 Main results from experiments 15

2.4 The ideal case for mRNA-protein comparison 16

2.5 Exploring correlation across genes 17

2.6 Exploring correlation within one gene 18

2.7 Correlation across subsets 18

2.8 Comparing mRNA and protein abundances across genes from two situations 19

2.9 Exercises 20

2.10 Bibliographic notes 21

3 Protein level quantification 22

3.1 Two-dimensional gels 22

3.1.1 Comparing results from different experiments – DIGE 23

3.2 Protein arrays 23

3.2.1 Forward arrays 24

3.2.2 Reverse arrays 25

3.2.3 Detection of binding molecules 25

3.2.4 Analysis of protein array readouts 25

3.3 Western blotting 25

3.4 ELISA – Enzyme-Linked Immunosorbent Assay 26

3.5 Bibliographic notes 26

4 Mass spectrometry and protein identification 27

4.1 Mass spectrometry 27

4.1.1 Peptide mass fingerprinting (PMF) 28

4.1.2 MS/MS – tandem MS 29

4.1.3 Mass spectrometers 29

4.2 Isotope composition of peptides 32

4.2.1 Predicting the isotope intensity distribution 34

4.2.2 Estimating the charge 34

4.2.3 Revealing isotope patterns 34

4.3 Presenting the intensities – the spectra 36

4.4 Peak intensity calculation 38

4.5 Peptide identification by MS/MS spectra 38

4.5.1 Spectral comparison 41

4.5.2 Sequential comparison 41

4.5.3 Scoring 42

4.5.4 Statistical significance 42

4.6 The protein inference problem 42

4.6.1 Determining maximal explanatory sets 44

4.6.2 Determining minimal explanatory sets 44

4.7 False discovery rate for the identifications 44

4.7.1 Constructing the decoy database 45

4.7.2 Separate or composite search 46

4.8 Exercises 46

4.9 Bibliographic notes 47

5 Protein quantification by mass spectrometry 48

5.1 Situations, protein, and peptide variants 48

5.1.1 Situation 48

5.1.2 Protein variants – peptide variants 48

5.2 Replicates 49

5.3 Run – experiment – project 50

5.3.1 LC-MS/MS run 50

5.3.2 Quantification run 51

5.3.3 Quantification experiment 52

5.3.4 Quantification project 52

5.3.5 Planning quantification experiments 52

5.4 Comparing quantification approaches/methods 54

5.4.1 Accuracy 54

5.4.2 Precision 55

5.4.3 Repeatability and reproducibility 56

5.4.4 Dynamic range and linear dynamic range 56

5.4.5 Limit of blank – LOB 56

5.4.6 Limit of detection – LOD 57

5.4.7 Limit of quantification – LOQ 57

5.4.8 Sensitivity 57

5.4.9 Selectivity 57

5.5 Classification of approaches for quantification using LC-MS/MS 57

5.5.1 Discovery or targeted protein quantification 58

5.5.2 Label based vs. label free quantification 59

5.5.3 Abundance determination – ion current vs. peptide identification 60

5.5.4 Classification 60

5.6 The peptide (occurrence) space 60

5.7 Ion chromatograms 62

5.8 From peptides to protein abundances 62

5.8.1 Combined single abundance from single abundances 64

5.8.2 Relative abundance from single abundances 65

5.8.3 Combined relative abundance from relative abundances 66

5.9 Protein inference and protein abundance calculation 67

5.9.1 Use of the peptides in protein abundance calculation 67

5.9.2 Classifying the proteins 68

5.9.3 Can shared peptides be used for quantification? 68

5.10 Peptide tables 70

5.11 Assumptions for relative quantification 70

5.12 Analysis for differentially abundant proteins 71

5.13 Normalization of data 71

5.14 Exercises 72

5.15 Bibliographic notes 74

6 Statistical normalization 75

6.1 Some illustrative examples 75

6.2 Non-normally distributed populations 76

6.2.1 Skewed distributions 76

6.2.2 Measures of skewness 76

6.2.3 Steepness of the peak – kurtosis 77

6.3 Testing for normality 78

6.3.1 Normal probability plot 79

6.3.2 Some test statistics for normality testing 81

6.4 Outliers 82

6.4.1 Test statistics for the identification of a single outlier 83

6.4.2 Testing for more than one outlier 86

6.4.3 Robust statistics for mean and standard deviation 88

6.4.4 Outliers in regression 89

6.5 Variance inequality 90

6.6 Normalization and logarithmic transformation 90

6.6.1 The logarithmic function 90

6.6.2 Choosing the base 91

6.6.3 Logarithmic normalization of peptide/protein ratios 91

6.6.4 Pitfalls of logarithmic transformations 92

6.6.5 Variance stabilization by logarithmic transformation 92

6.6.6 Logarithmic scale for presentation 93

6.7 Exercises 94

6.8 Bibliographic notes 95

7 Experimental normalization 96

7.1 Sources of variation and level of normalization 96

7.2 Spectral normalization 98

7.2.1 Scale based normalization 99

7.2.2 Rank based normalization 101

7.2.3 Combining scale based and rank based normalization 101

7.2.4 Reproducibility of the normalization methods 102

7.3 Normalization at the peptide and protein level 103

7.4 Normalizing using sum, mean, and median 104

7.5 MA-plot for normalization 104

7.5.1 Global intensity normalization 105

7.5.2 Linear regression normalization 106

7.6 Local regression normalization – LOWESS 106

7.7 Quantile normalization 107

7.8 Overfitting 108

7.9 Exercises 109

7.10 Bibliographic notes 109

8 Statistical analysis 110

8.1 Use of replicates for statistical analysis 110

8.2 Using a set of proteins for statistical analysis 111

8.2.1 Z-variable 111

8.2.2 G-statistic 112

8.2.3 Fisher–Irwin exact test 115

8.3 Missing values 116

8.3.1 Reasons for missing values 116

8.3.2 Handling missing values 118

8.4 Prediction and hypothesis testing 118

8.4.1 Prediction errors 119

8.4.2 Hypothesis testing 120

8.5 Statistical significance for multiple testing 121

8.5.1 False positive rate control 122

8.5.2 False discovery rate control 123

8.6 Exercises 127

8.7 Bibliographic notes 128

9 Label based quantification 129

9.1 Labeling techniques for label based quantification 129

9.2 Label requirements 130

9.3 Labels and labeling properties 130

9.3.1 Quantification level 130

9.3.2 Label incorporation 131

9.3.3 Incorporation level 131

9.3.4 Number of compared samples 132

9.3.5 Common labels 132

9.4 Experimental requirements 132

9.5 Recognizing corresponding peptide variants 133

9.5.1 Recognizing peptide variants in MS spectra 133

9.5.2 Recognizing peptide variants in MS/MS spectra 134

9.6 Reference free vs. reference based 135

9.6.1 Reference free quantification 135

9.6.2 Reference based quantification 135

9.7 Labeling considerations 136

9.8 Exercises 136

9.9 Bibliographic notes 137

10 Reporter based MS/MS quantification 138

10.1 Isobaric labels 138

10.2 iTRAQ 140

10.2.1 Fragmentation 141

10.2.2 Reporter ion intensities 143

10.2.3 iTRAQ 8-plex 144

10.3 TMT – Tandem Mass Tag 145

10.4 Reporter based quantification runs 145

10.5 Identification and quantification 145

10.6 Peptide table 147

10.7 Reporter based quantification experiments 147

10.7.1 Normalization across LC-MS/MS runs – use of a reference sample 147

10.7.2 Normalizing within an LC-MS/MS run 149

10.7.3 From reporter intensities to protein abundances 149

10.7.4 Finding differentially abundant proteins 150

10.7.5 Distributing the replicates on the quantification runs 151

10.7.6 Protocols 152

10.8 Exercises 152

10.9 Bibliographic notes 153

11 Fragment based MS/MS quantification 155

11.1 The label masses 155

11.2 Identification 157

11.3 Peptide and protein quantification 158

11.4 Exercises 158

11.5 Bibliographic notes 159

12 Label based quantification by MS spectra 160

12.1 Different labeling techniques 160

12.1.1 Metabolic labeling – SILAC 160

12.1.2 Chemical labeling 162

12.1.3 Enzymatic labeling – 18O 165

12.2 Experimental setup 166

12.3 MaxQuant as a model 167

12.3.1 HL-pairs 167

12.3.2 Reliability of HL-pairs 169

12.3.3 Reliable protein results 169

12.4 The MaxQuant procedure 169

12.4.1 Recognize HL-pairs 169

12.4.2 Estimate HL-ratios 176

12.4.3 Identify HL-pairs by database search 177

12.4.4 Infer protein data 181

12.5 Exercises 183

12.6 Bibliographic notes 184

13 Label free quantification by MS spectra 185

13.1 An ideal case – two protein samples 185

13.2 The real world 186

13.2.1 Multiple samples 187

13.3 Experimental setup 187

13.4 Forms 187

13.5 The quantification process 188

13.6 Form detection 189

13.7 Pair-wise retention time correction 191

13.7.1 Determining potentially corresponding forms 191

13.7.2 Linear corrections 192

13.7.3 Nonlinear corrections 192

13.8 Approaches for form tuple detection 193

13.9 Pair-wise alignment 193

13.9.1 Distance between forms 194

13.9.2 Finding an optimal alignment 195

13.10 Using a reference run for alignment 196

13.11 Complete pair-wise alignment 197

13.12 Hierarchical progressive alignment 197

13.12.1 Measuring the similarity or the distance of two runs 198

13.12.2 Constructing static guide trees 198

13.12.3 Constructing dynamic guide trees 199

13.12.4 Aligning subalignments 199

13.12.5 SuperHirn 199

13.13 Simultaneous iterative alignment 200

13.13.1 Constructing the initial alignment in XCMS 200

13.13.2 Changing the initial alignment 201

13.14 The end result and further analysis 202

13.15 Exercises 202

13.16 Bibliographic notes 204

14 Label free quantification by MS/MS spectra 205

14.1 Abundance measurements 205

14.2 Normalization 207

14.3 Proposed methods 207

14.4 Methods for single abundance calculation 207

14.4.1 emPAI 208

14.4.2 PMSS 208

14.4.3 NSAF 209

14.4.4 SI 209

14.5 Methods for relative abundance calculation 210

14.5.1 PASC 210

14.5.2 RIBAR 210

14.5.3 xRIBAR 211

14.6 Comparing methods 212

14.6.1 An analysis by Griffin 212

14.6.2 An analysis by Colaert 213

14.7 Improving the reliability of spectral count quantification 213

14.8 Handling shared peptides 214

14.9 Statistical analysis 215

14.10 Exercises 215

14.11 Bibliographic notes 216

15 Targeted quantification – Selected Reaction Monitoring 218

15.1 Selected Reaction Monitoring – the concept 218

15.2 A suitable instrument 219

15.3 The LC-MS/MS run 220

15.3.1 Sensitivity and accuracy 222

15.4 Label free and label based quantification 224

15.4.1 Label free SRM based quantification 224

15.4.2 Label based SRM based quantification 225

15.5 Requirements for SRM transitions 227

15.5.1 Requirements for the peptides 227

15.5.2 Requirements for the fragment ions 228

15.6 Finding optimal transitions 229

15.7 Validating transitions 230

15.7.1 Testing linearity 230

15.7.2 Determining retention time 231

15.7.3 Limit of detection/quantification 231

15.7.4 Dealing with low abundant proteins 231

15.7.5 Checking for interference 232

15.8 Assay development 232

15.9 Exercises 233

15.10 Bibliographic notes 234

16 Absolute quantification 235

16.1 Performing absolute quantification 235

16.1.1 Linear dependency between the calculated and the real abundances 236

16.2 Label based absolute quantification 236

16.2.1 Stable isotope-labeled peptide standards 237

16.2.2 Stable isotope-labeled concatenated peptide standards 238

16.2.3 Stable isotope-labeled intact protein standards 239

16.3 Label free absolute quantification 239

16.3.1 Quantification by MS spectra 239

16.3.2 Quantification by the number of MS/MS spectra 241

16.4 Exercises 242

16.5 Bibliographic notes 242

17 Quantification of post-translational modifications 244

17.1 PTM and mass spectrometry 244

17.2 Modification degree 245

17.3 Absolute modification degree 246

17.3.1 Reversing the modification 246

17.3.2 Use of two standards 248

17.3.3 Label free modification degree analysis 249

17.4 Relative modification degree 250

17.5 Discovery based modification stoichiometry 251

17.5.1 Separate LC-MS/MS experiments for modified and unmodified peptides 251

17.5.2 Common LC-MS/MS experiment for modified and unmodified peptides 252

17.5.3 Reliable results and significant differences 252

17.6 Exercises 253

17.7 Bibliographic notes 253

18 Biomarkers 254

18.1 Evaluation of potential biomarkers 254

18.1.1 Taking disease prevalence into account 255

18.2 Evaluating threshold values for biomarkers 257

18.3 Exercises 258

18.4 Bibliographic notes 258

19 Standards and databases 259

19.1 Standard data formats for (quantitative) proteomics 259

19.1.1 Controlled vocabularies (CVs) 260

19.1.2 Benefits of using CV terms to annotate metadata 260

19.1.3 A standard for quantitative proteomics data 261

19.1.4 HUPO PSI 262

19.2 Databases for proteomics data 262

19.3 Bibliographic notes 263

20 Appendix A: Statistics 264

20.1 Samples, populations, and statistics 264

20.2 Population parameter estimation 265

20.2.1 Estimating the mean of a population 266

20.3 Hypothesis testing 267

20.3.1 Two types of errors 268

20.4 Performing the test – test statistics and p-values 268

20.4.1 Parametric test statistics 269

20.4.2 Nonparametric test statistics 269

20.4.3 Confidence intervals and hypothesis testing 270

20.5 Comparing means of populations 271

20.5.1 Analyzing the mean of a single population 271

20.5.2 Comparing the means from two populations 272

20.5.3 Comparing means of paired populations 275

20.5.4 Multiple populations 275

20.5.5 Multiple testing 276

20.6 Comparing variances 276

20.6.1 Testing the variance of a single population 276

20.6.2 Testing the variances of two populations 277

20.7 Percentiles and quantiles 278

20.7.1 A straightforward method for estimating the percentiles 279

20.7.2 Quantiles 279

20.7.3 Box plots 280

20.8 Correlation 280

20.8.1 Pearson’s product-moment correlation coefficient 283

20.8.2 Spearman’s rank correlation coefficient 285

20.8.3 Correlation line 286

20.9 Regression analysis 287

20.9.1 Regression line 288

20.9.2 Relation between Pearson’s correlation coefficient and the regression parameters 289

20.10 Types of values and variables 290

21 Appendix B: Clustering and discriminant analysis 292

21.1 Clustering 292

21.1.1 Distances and similarities 293

21.1.2 Distance measures 294

21.1.3 Similarity measures 295

21.1.4 Distances between an object and a class 295

21.1.5 Distances between two classes 296

21.1.6 Missing data 297

21.1.7 Clustering approaches 297

21.1.8 Sequential clustering 298

21.1.9 Hierarchical clustering 300

21.2 Discriminant analysis 303

21.2.1 Step-wise feature selection 304

21.2.2 Linear discriminant analysis using original features 307

21.2.3 Canonical discriminant analysis 309

21.3 Bibliographic notes 312

Bibliography 313

Index 327

Verlagsort	New York
Sprache	englisch
Maße	160 x 239 mm
Gewicht	576 g
Themenwelt	Mathematik / Informatik ► Mathematik ► Statistik
	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
	Naturwissenschaften ► Biologie ► Biochemie
	Naturwissenschaften ► Chemie ► Analytische Chemie
ISBN-10	1-119-96400-8 / 1119964008
ISBN-13	978-1-119-96400-1 / 9781119964001
Zustand	Neuware