New Frontiers of Biostatistics and Bioinformatics (eBook)
XXIV, 463 Seiten
Springer International Publishing (Verlag)
978-3-319-99389-8 (ISBN)
This book is comprised of presentations delivered at the 5th Workshop on Biostatistics and Bioinformatics held in Atlanta on May 5-7, 2017. Featuring twenty-two selected papers from the workshop, this book showcases the most current advances in the field, presenting new methods, theories, and case applications at the frontiers of biostatistics, bioinformatics, and interdisciplinary areas.
Biostatistics and bioinformatics have been playing a key role in statistics and other scientific research fields in recent years. The goal of the 5th Workshop on Biostatistics and Bioinformatics was to stimulate research, foster interaction among researchers in field, and offer opportunities for learning and facilitating research collaborations in the era of big data. The resulting volume offers timely insights for researchers, students, and industry practitioners.
Yichuan Zhao is a Professor of Statistics at Georgia State University in Atlanta. He has a joint appointment as Associate Member of the Neuroscience Institute and he is also an affiliated faculty member of School of Public Health at Georgia State University. His current research interest focuses on Survival Analysis, Empirical Likelihood Method, Nonparametric Statistics, Analysis of ROC Curves, Bioinformatics, Monte Carlo Methods, and Statistical Modeling of Fuzzy Systems. He has published over 80 research articles in statistics, has co-edited two books on statistics, biostatistics & data science, and has been invited to deliver more than 170 research talks nationally and internationally. Dr. Zhao has organized the Workshop Series on Biostatistics and Bioinformatics since its initiation in 2012. He also organized the 25th ICSA Applied Statistics Symposium in Atlanta as a chair of the organizing committee to great success. He is currently serving as editor, or on the editorial board, for several statistical journals. Dr. Zhao is an elected member of the International Statistical Institute.
Ding-Geng (Din) Chen is a Fellow of the American Statistical Association and is currently the Wallace Kuralt distinguished professor at the University of North Carolina at Chapel Hill. He was a professor at the University of Rochester and the Karl E. Peace endowed eminent scholar chair in biostatistics at Georgia Southern University. He is also a senior statistics consultant for biopharmaceuticals and government agencies with extensive expertise in Monte-Carlo simulations, clinical trial biostatistics and public health statistics. Professor Chen has more than 100 referred professional publications, has co-authored and co-edited six books on clinical trial methodology, meta-analysis and public health applications, and has been invited nationally and internationally to give speeches on his research. Professor Chen was honored with the 'Award of Recognition' in 2014 by the Deming Conference Committee for highly successful advanced biostatistics workshop tutorials with his books.
Yichuan Zhao is a Professor of Statistics at Georgia State University in Atlanta. He has a joint appointment as Associate Member of the Neuroscience Institute and he is also an affiliated faculty member of School of Public Health at Georgia State University. His current research interest focuses on Survival Analysis, Empirical Likelihood Method, Nonparametric Statistics, Analysis of ROC Curves, Bioinformatics, Monte Carlo Methods, and Statistical Modeling of Fuzzy Systems. He has published over 80 research articles in statistics, has co-edited two books on statistics, biostatistics & data science, and has been invited to deliver more than 170 research talks nationally and internationally. Dr. Zhao has organized the Workshop Series on Biostatistics and Bioinformatics since its initiation in 2012. He also organized the 25th ICSA Applied Statistics Symposium in Atlanta as a chair of the organizing committee to great success. He is currently serving as editor, or on the editorial board, for several statistical journals. Dr. Zhao is an elected member of the International Statistical Institute. Ding-Geng (Din) Chen is a Fellow of the American Statistical Association and is currently the Wallace Kuralt distinguished professor at the University of North Carolina at Chapel Hill. He was a professor at the University of Rochester and the Karl E. Peace endowed eminent scholar chair in biostatistics at Georgia Southern University. He is also a senior statistics consultant for biopharmaceuticals and government agencies with extensive expertise in Monte-Carlo simulations, clinical trial biostatistics and public health statistics. Professor Chen has more than 100 referred professional publications, has co-authored and co-edited six books on clinical trial methodology, meta-analysis and public health applications, and has been invited nationally and internationally to give speeches on his research. Professor Chen was honored with the "Award of Recognition" in 2014 by the Deming Conference Committee for highly successful advanced biostatistics workshop tutorials with his books.
Preface 6
Part I: Review and Theoretical Framework in Biostatistics (Chaps. 1 –4) 7
Part II: Wavelet-Based Approach for Complex Data (Chaps. 5 –8) 7
Part III: Clinical Trials and Statistical Modeling (Chaps. 9 –14) 8
Part IV: High-Dimensional Gene Expression Data Analysis (Chaps. 15 –18) 9
Part V: Survival Analysis (Chaps. 19 –22) 10
Contents 12
List of Contributors 15
List of Chapter Reviewers 19
About the Editors 22
Part I Review of Theoretical Framework in Biostatistics 24
1 Optimal Weighted Wilcoxon–Mann–Whitney Testfor Prioritized Outcomes 25
1.1 Introduction 26
1.2 Wilcoxon–Mann–Whitney Test for Prioritized Endpoints 31
1.2.1 Notation 31
1.2.2 Wilcoxon–Mann–Whitney Test 32
1.2.3 Weighted Wilcoxon–Mann–Whitney Test 34
1.2.3.1 Prespecified Weights 35
1.2.3.2 Optimal Weights 36
1.3 Simulation Studies 39
1.4 Application to a Stroke Clinical Trial 40
1.5 Discussion 43
Appendix 43
Appendix 1: Proof of Theorem 1.1 48
Appendix 2: Mean and Variance of the Weighted U-Statistic 52
Appendix 3: Optimal Weights 54
Appendix 4: Conditional Probabilities 56
Exponential Distribution 56
Normal Distribution 57
References 57
2 A Selective Overview of Semiparametric Mixtureof Regression Models 63
2.1 Introduction 63
2.2 Mixture of Regression Models with Varying Proportions 65
2.2.1 Continuous Response, p=1 65
2.2.2 Continuous Response, p> 1
2.2.3 Discrete Response 68
2.3 Nonparametric Errors 70
2.3.1 Semiparametric EM Algorithm with Kernel Density Error 70
2.3.2 Log-Concave Density Error 71
2.3.3 Mixtures of Quantile Regressions 73
2.4 Semiparametric Mixture of Nonparametric Regressions 74
2.4.1 Nonparametric Mixture of Regressions 74
2.4.2 Nonparametric Component Regression Functions 75
2.4.3 Mixture of Regressions with Single-Index 77
2.5 Semiparametric Regression Models for Longitudinal/Functional Data 80
2.5.1 Mixture of Time-Varying Effects for Intensive Longitudinal Data 80
2.5.2 Mixtures of Gaussian Processes 81
2.5.3 Mixture of Functional Linear Models 82
2.6 Some Additional Topics 85
2.7 Discussion 85
References 86
3 Rank-Based Empirical Likelihood for Regression Modelswith Responses Missing at Random 88
3.1 Introduction 88
3.2 Imputation 90
3.2.1 Imputation Under MAR 91
3.2.2 Empirical Likelihood Method 93
3.3 Simulation Study 96
3.3.1 Simulation Settings 96
3.3.2 Real Data 100
3.4 Conclusion 102
Appendix 103
Assumptions 103
References 106
4 Bayesian Nonparametric Spatially Smoothed Density Estimation 108
4.1 Introduction 108
4.2 The Predictive Model 110
4.2.1 Markov Chain Monte Carlo 114
4.2.2 Censored Data 115
4.2.3 Direct Estimation and a Permutation Test p-Value 115
4.3 Examples 117
4.3.1 IgG Distribution Evolving with Age 117
4.3.2 Time to Infection in Amphibian Populations 119
4.3.3 Simulated Data 122
4.4 Conclusion 124
References 125
Part II Wavelet-Based Approach for Complex Data 127
5 Mammogram Diagnostics Using Robust Wavelet-Based Estimator of Hurst Exponent 128
5.1 Introduction 128
5.2 Background 131
5.2.1 Non-decimated Wavelet Transforms 131
5.2.2 The fBm: Wavelet Coefficients and Spectra 132
5.3 General Trimean Estimators 133
5.3.1 Tukey's Trimean Estimator 135
5.3.2 Gastwirth Estimator 135
5.4 Methods 136
5.4.1 General Trimean of the Mid-energy (GTME) Method 137
5.4.2 General Trimean of the Logarithm of Mid-energy (GTLME) Method 139
5.4.3 Special Cases: Tukey's Trimean and Gastwirth Estimators 141
5.5 Simulation 143
5.6 Application 145
5.7 Conclusions 148
Appendix 149
Proof of Theorem 5.1 149
Proof of Theorem 5.2 151
Proof of Lemma 5.2 153
Proof of Lemma 5.3 155
References 157
6 Wavelet-Based Profile Monitoring Using Order-Thresholding Recursive CUSUM Schemes 160
6.1 Introduction 160
6.2 Problem Formulation and Wavelet Background 164
6.3 Our Proposed Method 165
6.3.1 In-Control Estimation 166
6.3.2 Out-of-Control Estimation and Local Statistics 167
6.3.3 Global Online Monitoring Procedure 170
6.3.4 Parameter Settings 171
6.4 Case Study 173
6.5 Simulation Study 174
6.6 Conclusions 176
References 177
7 Estimating the Confidence Interval of Evolutionary Stochastic Process Mean from Wavelet Based Bootstrapping 179
7.1 Introduction 179
7.2 Resampling Time Series 180
7.2.1 Bootstrap Based on Wavelets 182
7.3 Proposed Methods 184
7.4 Results 185
7.5 Final Considerations 191
References 192
8 A New Wavelet-Based Approach for Mass SpectrometryData Classification 193
8.1 Introduction 193
8.2 The Proposed Approach 196
8.2.1 Wavelets Analysis 197
8.2.2 Principal Component Analysis and Hotelling T2 Statistic 200
8.2.3 Support Vector Machines 201
8.3 Experiments and Results 202
8.3.1 Results and Performance 203
8.4 Conclusion 206
References 206
Part III Clinical Trials and Statistical Modeling 208
9 Statistical Power and Bayesian Assurance in Clinical Trial Design 209
9.1 Introduction 209
9.2 A Paradigm Change from Statistical Power to Bayesian Assurance 210
9.2.1 Conventional Statistical Power and Its Limitations 210
9.2.2 Bayesian Assurance in Clinical Trials 212
9.3 Computational Implementation on Bayesian Assurance 213
9.4 Discussion 216
References 216
10 Equivalence Tests in Subgroup Analyses 217
10.1 Introduction 217
10.1.1 Why Are Subgroup Analyses Important Within the Framework of Evidence-Based Medicine? 217
10.1.2 Approaches to Perform and Interpret Subgroup Analyses 219
10.1.3 Objectives and Organisation of This Chapter 220
10.2 The Concept of Testing Equivalence of Subgroup Outcomes 221
10.2.1 Generalised Linear Model 221
10.2.2 Equivalence Tests 223
10.2.3 Outline of the Simulations 226
10.3 An Equivalence Test for Consistency of Subgroup Effects with a Quantitative Endpoint 227
10.3.1 Definition of the Test 227
10.3.2 Performance of the Consistency Test with Respect to the Equivalence Margins 228
10.3.3 Implications of the Variance Scaling 233
10.3.4 Example for Planning the Study Design 235
10.4 An Equivalence Test for the Consistency of Subgroup Effects with a Binary Endpoint 237
10.4.1 Definition of the Test 237
10.4.2 Quantification of the Interaction 239
10.4.3 Simulation Setup 240
10.4.4 Power of the Consistency Test for Binary Endpoints 241
10.4.5 Discussion 244
10.4.6 Example for Planning the Study Design 247
10.5 Discussion 247
10.5.1 Subgroup-by-Treatment Interaction in the General Linear Model 247
10.5.2 Selection of the Equivalence Margin 248
10.5.3 Considerations for Improvement of the Consistency Test 249
10.5.4 Future Developments 251
10.5.5 Conclusion 252
References 252
11 Predicting Confidence Interval for the Proportion at the Time of Study Planning in Small Clinical Trials 255
11.1 Introduction 255
11.2 Predictions of Future Exact Confidence Intervals 257
11.2.1 Approach 1: Simple Plugging In 257
11.2.2 Approach 2: Hypothesis Testing Approach 258
11.2.2.1 Approach 2-1: Discrete Hypothesis Testing Approach 259
11.2.2.2 Approach 2-2: Continuous Hypothesis Testing Approach 259
11.2.3 Approach 3: Expected Confidence Interval 260
11.3 Sample Size Calculation Based on the Future Exact Confidence Interval Prediction 260
11.4 Prediction of the Arcsine Confidence Interval 263
11.5 Applications 265
11.6 Discussion 266
A.1 Appendix 267
References 270
12 Importance of Adjusting for Multi-stage DesignWhen Analyzing Data from Complex Surveys 272
12.1 Introduction 272
12.1.1 Use of National Surveys in Behavioral Research 272
12.1.2 Variance Estimation Using BRR 273
12.1.3 Application of BRR for the TUS-CPS Data Analysis 274
12.1.4 Three Analytical Methods 275
12.2 Examples 277
12.3 Discussion 280
References 281
13 Analysis of the High School Longitudinal Study to Evaluate Associations Among Mathematics Achievement, Mentorship and Student Participation in STEM Programs 284
13.1 Introduction 284
13.2 Methods 286
13.2.1 Data and Sample 286
13.2.2 Study Variables 287
13.2.3 Statistical Methods 289
13.3 Results 289
13.3.1 Study Participants 289
13.3.2 Assessment of Student Mathematics Achievement 291
13.3.3 Assessment of Student Enrollment in a STEM Major/Career 296
13.4 Conclusions 299
Appendix 303
References 304
14 Statistical Modeling for the Heart Disease Diagnosisvia Multiple Imputation 306
14.1 Introduction 306
14.2 Data Analysis 307
14.2.1 Descriptive Analysis 307
14.2.2 Multiple Imputation 309
14.2.3 Model Building 312
14.3 Discussion 316
14.4 Conclusion 316
References 317
Part IV High-Dimensional Gene Expression Data Analysis 318
15 Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data 319
15.1 Introduction 319
15.2 Mixture Gaussian Graphical Models 321
15.2.1 Algorithms for Homogeneous Data 321
15.2.2 The Mixture Gaussian Graphical Model Method 324
15.3 Simulation Studies 327
15.3.1 Example 1 328
15.3.2 Example 2 333
15.3.3 Identification of Cluster Numbers 333
15.4 A Real Data Example 337
15.5 Discussion 340
References 340
16 Performance Evaluation of Normalization Approachesfor Metagenomic Compositional Data on DifferentialAbundance Analysis 342
16.1 Introduction 342
16.2 Motivating Example 344
16.3 Data Notation and Methods 345
16.4 Simulation Study 347
16.4.1 Parameters and Data Characteristics 347
16.4.2 Data Simulation 348
16.4.3 Normalization Performance 349
16.4.4 Impact of Normalization on Differential Abundance Analysis 349
16.5 Discussion 352
16.5.1 TMM and RLE with Metagenomic Compositional Dataset 352
16.5.2 Simulation Benchmark 352
16.5.3 Novel Normalization Methods Are Needed 352
A.1 Appendix 353
A.1.1 Supplementary Data Distribution 353
A.1.2 Supplementary Illustration of TMM and RLE with Compositional Dataset 354
A.1.3 Supplementary Example 355
References 356
17 Identification of Pathway-Modulating Genes Usingthe Biomedical Literature Mining 358
17.1 Introduction 358
17.2 Methods 359
17.2.1 Text Mining of Biomedical Literature 359
17.2.2 Database and Web Interface for Biomedical Literature Mining 361
17.2.3 bayesGO: Bayesian Hierarchical Model to Identify Pathway-Modulating Genes 362
17.3 Results 364
17.3.1 Summary and Preprocessing of Literature Mining Results 364
17.3.2 bayesGO Analysis 365
17.4 Conclusion 372
Appendix 373
References 376
18 Discriminant Analysis and Normalization Methodsfor Next-Generation Sequencing Data 377
18.1 Introduction 377
18.2 Discriminant Analysis for Microarray Data 379
18.2.1 Linear Discriminant Analysis 379
18.2.2 Diagonal Linear Discriminant Analysis 380
18.3 Discriminant Analysis for Next-Generation Sequencing Data 381
18.3.1 Poisson Linear Discriminant Analysis 381
18.3.2 Zero-Inflated Poisson Logistic Discriminant Analysis 382
18.3.3 Negative Binomial Linear Discriminant Analysis 383
18.4 Normalization Methods for Next-Generation Sequencing Data 384
18.4.1 Normalization for Same Species 384
18.4.1.1 The Trimmed Mean of M-Values Normalization Method 385
18.4.1.2 A Hypothesis Testing Based Normalization Scaling Factor Method 386
18.4.2 Normalization for Different Species 387
18.5 Simulation Studies 388
18.5.1 Simulation Design 389
18.5.2 Simulation Results 390
18.6 Real Data Analysis 391
18.7 Discussion 393
References 394
Part V Survival Analysis 397
19 On the Landmark Survival Model for Dynamic Predictionof Event Occurrence Using Longitudinal Data 398
19.1 Introduction 398
19.2 Joint Distribution of the Longitudinal and Time-to-Event Data for the Landmark Cox Model 404
19.3 Extension to the Landmark Linear Transformation Model 407
19.4 Simulation 408
19.5 Discussion 409
References 411
20 Nonparametric Estimation of a Cumulative Hazard Function with Right Truncated Data 413
20.1 Introduction 413
20.2 Nonparametric Inference for the Reverse-Time Hazard Function 415
20.3 Nonparametric Inference for the Cumulative Hazard Function 417
20.3.1 Estimation of the Cumulative Hazard Function 417
20.3.2 One-Sample Log-Rank Test 418
20.4 Two-Sample Weighted Tests 419
20.5 Simulation Studies 420
20.5.1 Study I 420
20.5.2 Study II 421
20.6 The Blood Transfusion Infected AIDS Data 423
20.7 Discussion 423
Appendix: Asymptotic Distribution of U(t) 426
References 429
21 Empirical Study on High-Dimensional Variable Selectionand Prediction Under Competing Risks 431
21.1 Introduction 431
21.2 Competing Risk Models 432
21.2.1 The PCSH Model 433
21.2.2 The PSDH Model 434
21.3 Regularization 434
21.3.1 LASSO 435
21.3.2 Boosting 436
21.4 Simulations 436
21.4.1 Setup 436
21.4.2 Results 438
21.5 Discussion 444
References 449
22 Nonparametric Estimation of a Hazard Rate Functionwith Right Truncated Data 451
22.1 Introduction 451
22.2 Nonparametric Inference of Reverse-Time Hazard Rate Function 452
22.3 Nonparametric Inference of Hazard Rate Function 455
22.4 Simulation Study 457
22.5 The Blood Transfusion Infected AIDS Data 459
22.6 Discussion 464
Appendix 464
References 466
Index 467
Erscheint lt. Verlag | 5.12.2018 |
---|---|
Reihe/Serie | ICSA Book Series in Statistics | ICSA Book Series in Statistics |
Zusatzinfo | XXIV, 463 p. 138 illus., 62 illus. in color. |
Verlagsort | Cham |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
Medizin / Pharmazie ► Allgemeines / Lexika | |
Wirtschaft | |
Schlagworte | adaptive design • Biostatistical Procedures • Competing Risk Data Analysis • Complex Data Analysis • Data Mining • Density Estimation • fMRI data analysis • Functional Data • Gene expression analysis • high dimensional statistical method • image data analysis • longitudinal data • Mixture Model Analysis • Multivariate Survival Data Analysis • network analysis • Variable selection • Wavelet |
ISBN-10 | 3-319-99389-5 / 3319993895 |
ISBN-13 | 978-3-319-99389-8 / 9783319993898 |
Haben Sie eine Frage zum Produkt? |
Größe: 8,3 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich