Tutorials in Chemoinformatics
John Wiley & Sons Inc (Verlag)
978-1-119-13796-2 (ISBN)
30 tutorials and more than 100 exercises in chemoinformatics, supported by online software and data sets
Chemoinformatics is widely used in both academic and industrial chemical and biochemical research worldwide. Yet, until this unique guide, there were no books offering practical exercises in chemoinformatics methods. Tutorials in Chemoinformatics contains more than 100 exercises in 30 tutorials exploring key topics and methods in the field. It takes an applied approach to the subject with a strong emphasis on problem-solving and computational methodologies.
Each tutorial is self-contained and contains exercises for students to work through using a variety of software packages. The majority of the tutorials are divided into three sections devoted to theoretical background, algorithm description and software applications, respectively, with the latter section providing step-by-step software instructions. Throughout, three types of software tools are used: in-house programs developed by the authors, open-source programs and commercial programs which are available for free or at a modest cost to academics. The in-house software and data sets are available on a dedicated companion website.
Key topics and methods covered in Tutorials in Chemoinformatics include:
Data curation and standardization
Development and use of chemical databases
Structure encoding by molecular descriptors, text strings and binary fingerprints
The design of diverse and focused libraries
Chemical data analysis and visualization
Structure-property/activity modeling (QSAR/QSPR)
Ensemble modeling approaches, including bagging, boosting, stacking and random subspaces
3D pharmacophores modeling and pharmacological profiling using shape analysis
Protein-ligand docking
Implementation of algorithms in a high-level programming language
Tutorials in Chemoinformatics is an ideal supplementary text for advanced undergraduate and graduate courses in chemoinformatics, bioinformatics, computational chemistry, computational biology, medicinal chemistry and biochemistry. It is also a valuable working resource for medicinal chemists, academic researchers and industrial chemists looking to enhance their chemoinformatics skills.
Edited by Alexandre Varnek, PhD, is a professor of theoretical chemistry at The University of Strasbourg, France where he heads the Laboratory of Chemoinformatics, and is Director of two MSc programs: Chemoinformatics and In Silico Drug Design. Professor Varnek's research focuses on developing new approaches and tools for virtual screening and "in silico" design of new compounds and chemical reactions.
List of Contributors xv
Preface xvii
About the Companion Website xix
Part 1 Chemical Databases 1
1 Data Curation 3
Gilles Marcou and Alexandre Varnek
Theoretical Background 3
Software 5
Step‐by‐Step Instructions 7
Conclusion 34
References 36
2 Relational Chemical Databases: Creation, Management, and Usage 37
Gilles Marcou and Alexandre Varnek
Theoretical Background 37
Step‐by‐Step Instructions 41
Conclusion 65
References 65
3 Handling of Markush Structures 67
Timur Madzhidov, Ramil Nugmanov, and Alexandre Varnek
Theoretical Background 67
Step‐by‐Step Instructions 68
Conclusion 73
References 73
4 Processing of SMILES, InChI, and Hashed Fingerprints 75
João Montargil Aires de Sousa
Theoretical Background 75
Algorithms 76
Step‐by‐Step Instructions 78
Conclusion 80
References 81
Part 2 Library Design 83
5 Design of Diverse and Focused Compound Libraries 85
Antonio de la Vega de Leon, Eugen Lounkine, Martin Vogt, and Jürgen Bajorath
Introduction 85
Data Acquisition 86
Implementation 86
Compound Library Creation 87
Compound Library Analysis 90
Normalization of Descriptor Values 91
Visualizing Descriptor Distributions 92
Decorrelation and Dimension Reduction 94
Partitioning and Diverse Subset Calculation 95
Partitioning 95
Diverse Subset Selection 97
Combinatorial Libraries 98
Combinatorial Enumeration of Compounds 98
Retrosynthetic Approaches to Library Design 99
References 101
Part 3 Data Analysis and Visualization 103
6 Hierarchical Clustering in R 105
Martin Vogt and Jürgen Bajorath
Theoretical Background 105
Algorithms 106
Instructions 107
Hierarchical Clustering Using Fingerprints 108
Hierarchical Clustering Using Descriptors 111
Visualization of the Data Sets 113
Alternative Clustering Methods 116
Conclusion 117
References 118
7 Data Visualization and Analysis Using Kohonen Self‐Organizing Maps 119
João Montargil Aires de Sousa
Theoretical Background 119
Algorithms 120
Instructions 121
Conclusion 126
References 126
Part 4 Obtaining and Validation QSAR/QSPR Models 127
8 Descriptors Generation Using the CDK Toolkit and Web Services 129
João Montargil Aires de Sousa
Theoretical Background 129
Algorithms 130
Step‐by‐Step Instructions 131
Conclusion 133
References 134
9 QSPR Models on Fragment Descriptors 135
Vitaly Solov’ev and Alexandre Varnek
Abbreviations 135
Data 136
ISIDA_QSPR Input 137
Data Split Into Training and Test Sets 139
Substructure Molecular Fragment (SMF) Descriptors 139
Regression Equations 142
Forward and Backward Stepwise Variable Selection 142
Parameters of Internal Model Validation 143
Applicability Domain (AD) of the Model 143
Storage and Retrieval Modeling Results 144
Analysis of Modeling Results 144
Root‐Mean Squared Error (RMSE) Estimation 148
Setting the Parameters 151
Analysis of n‐Fold Cross‐Validation Results 151
Loading Structure‐Data File 153
Descriptors and Fitting Equation 154
Variables Selection 155
Consensus Model 155
Model Applicability Domain 155
n‐Fold External Cross‐Validation 155
Saving and Loading of the Consensus Modeling Results 155
Statistical Parameters of the Consensus Model 156
Consensus Model Performance as a Function of Individual Models Acceptance Threshold 157
Building Consensus Model on the Entire Data Set 158
Loading Input Data 159
Loading Selected Models and Choosing their Applicability Domain 160
Reporting Predicted Values 160
Analysis of the Fragments Contributions 161
References 161
10 Cross‐Validation and the Variable Selection Bias 163
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 163
Step‐by‐Step Instructions 165
Conclusion 172
References 173
11 Classification Models 175
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 176
Algorithms 178
Step‐by‐Step Instructions 180
Conclusion 191
References 192
12 Regression Models 193
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 194
Step‐by‐Step Instructions 197
Conclusion 207
References 208
13 Benchmarking Machine‐Learning Methods 209
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 209
Step‐by‐Step Instructions 210
Conclusion 222
References 222
14 Compound Classification Using the scikit‐learn Library 223
Jenny Balfer, Jürgen Bajorath, and Martin Vogt
Theoretical Background 224
Algorithms 225
Step‐by‐Step Instructions 230
Naïve Bayes 230
Decision Tree 231
Support Vector Machine 234
Notes on Provided Code 237
Conclusion 238
References 239
Part 5 Ensemble Modeling 241
15 Bagging and Boosting of Classification Models 243
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 243
Algorithm 244
Step by Step Instructions 245
Conclusion 247
References 247
16 Bagging and Boosting of Regression Models 249
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 249
Algorithm 249
Step‐by‐Step Instructions 250
Conclusion 255
References 255
17 Instability of Interpretable Rules 257
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 257
Algorithm 258
Step‐by‐Step Instructions 258
Conclusion 261
References 261
18 Random Subspaces and Random Forest 263
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 264
Algorithm 264
Step‐by‐Step Instructions 265
Conclusion 269
References 269
19 Stacking 271
Igor I. Baskin, Gilles Marcou, Dragos Horvath, and Alexandre Varnek
Theoretical Background 271
Algorithm 272
Step‐by‐Step Instructions 273
Conclusion 277
References 278
Part 6 3D Pharmacophore Modeling 279
20 3D Pharmacophore Modeling Techniques in Computer‐Aided Molecular Design Using LigandScout 281
Thomas Seidel, Sharon D. Bryant, Gökhan Ibis, Giulio Poli, and Thierry Langer
Introduction 281
Theory: 3D Pharmacophores 283
Representation of Pharmacophore Models 283
Hydrogen‐Bonding Interactions 285
Hydrophobic Interactions 285
Aromatic and Cation‐π Interactions 286
Ionic Interactions 286
Metal Complexation 286
Ligand Shape Constraints 287
Pharmacophore Modeling 288
Manual Pharmacophore Construction 288
Structure‐Based Pharmacophore Models 289
Ligand‐Based Pharmacophore Models 289
3D Pharmacophore‐Based Virtual Screening 291
3D Pharmacophore Creation 291
Annotated Database Creation 291
Virtual Screening‐Database Searching 292
Hit‐List Analysis 292
Tutorial: Creating 3D‐Pharmacophore Models Using LigandScout 294
Creating Structure‐Based Pharmacophores From a Ligand‐Protein Complex 294
Description: Create a Structure‐Based Pharmacophore Model 296
Create a Shared Feature Pharmacophore Model From Multiple Ligand‐Protein Complexes 296
Description: Create a Shared Feature Pharmacophore and Align it to Ligands 297
Create Ligand‐Based Pharmacophore Models 298
Description: Ligand‐Based Pharmacophore Model Creation 300
Tutorial: Pharmacophore‐Based Virtual Screening Using LigandScout 301
Virtual Screening, Model Editing, and Viewing Hits in the Target Active Site 301
Description: Virtual Screening and Pharmacophore Model Editing 302
Analyzing Screening Results with Respect to the Binding Site 303
Description: Analyzing Hits in the Active Site Using LigandScout 305
Parallel Virtual Screening of Multiple Databases Using LigandScout 305
Virtual Screening in the Screening Perspective of LigandScout 306
Description: Virtual Screening Using LigandScout 306
Conclusions 307
Acknowledgments 307
References 307
Part 7 The Protein 3D‐Structures in Virtual Screening 311
21 The Protein 3D‐Structures in Virtual Screening 313
Inna Slynko and Esther Kellenberger
Introduction 313
Description of the Example Case 314
Thrombin and Blood Coagulation 314
Active Thrombin and Inactive Prothrombin 314
Thrombin as a Drug Target 314
Thrombin Three‐Dimensional Structure: The 1OYT PDB File 315
Modeling Suite 315
Overall Description of the Input Data Available on the Editor Website 315
Exercise 1: Protein Analysis and Preparation 316
Step 1: Identification of Molecules Described in the 1OYT PDB File 316
Step 2: Protein Quality Analysis of the Thrombin/Inhibitor PDB Complex Using MOE Geometry Utility 320
Step 3: Preparation of the Protein for Drug Design Applications 321
Step 4: Description of the Protein‐Ligand Binding Mode 325
Step 5: Detection of Protein Cavities 328
Exercise 2: Retrospective Virtual Screening Using the Pharmacophore Approach 330
Step 1: Description of the Test Library 332
Step 2.1: Pharmacophore Design, Overview 333
Step 2.2: Pharmacophore Design, Flexible Alignment of Three Thrombin Inhibitors 334
Step 2.3: Pharmacophore Design, Query Generation 335
Step 3: Pharmacophore Search 337
Exercise 3: Retrospective Virtual Screening Using the Docking Approach 341
Step 1: Description of the Test Library 341
Step 2: Preparation of the Input 341
Step 3: Re‐Docking of the Crystallographic Ligand 341
Step 4: Virtual Screening of a Database 345
General Conclusion 350
References 351
Part 8 Protein‐Ligand Docking 353
22 Protein‐Ligand Docking 355
Inna Slynko, Didier Rognan, and Esther Kellenberger
Introduction 355
Description of the Example Case 356
Methods 356
Ligand Preparation 359
Protein Preparation 359
Docking Parameters 360
Description of Input Data Available on the Editor Website 360
Exercises 362
A Quick Start with LeadIT 362
Re‐Docking of Tacrine into AChE 362
Preparation of AChE From 1ACJ PDB File 362
Docking of Neutral Tacrine, then of Positively Charged Tacrine 363
Docking of Positively Charged Tacrine in AChE in Presence of Water 365
Cross‐Docking of Tacrine‐Pyridone and Donepezil Into AChE 366
Preparation of AChE From 1ACJ PDB File 366
Cross‐Docking of Tacrine‐Pyridone Inhibitor and Donepezil in AChE in Presence of Water 367
Re‐Docking of Donepezil in AChE in Presence of Water 370
General Conclusions 372
Annex: Screen Captures of LeadIT Graphical Interface 372
References 375
Part 9 Pharmacophorical Profiling Using Shape Analysis 377
23 Pharmacophorical Profiling Using Shape Analysis 379
Jérémy Desaphy, Guillaume Bret, Inna Slynko, Didier Rognan, and Esther Kellenberger
Introduction 379
Description of the Example Case 380
Aim and Context 380
Description of the Searched Data Set 381
Description of the Query 381
Methods 381
Rocs 381
VolSite and Shaper 384
Other Programs for Shape Comparison 384
Description of Input Data Available on the Editor Website 385
Exercises 387
Preamble: Practical Considerations 387
Ligand Shape Analysis 387
What are ROCS Output Files? 387
Binding Site Comparison 388
Conclusions 390
References 391
Part 10 Algorithmic Chemoinformatics 393
24 Algorithmic Chemoinformatics 395
Martin Vogt, Antonio de la Vega de Leon, and Jürgen Bajorath
Introduction 395
Similarity Searching Using Data Fusion Techniques 396
Introduction to Virtual Screening 396
The Three Pillars of Virtual Screening 397
Molecular Representation 397
Similarity Function 397
Search Strategy (Data Fusion) 397
Fingerprints 397
Count Fingerprints 397
Fingerprint Representations 399
Bit Strings 399
Feature Lists 399
Generation of Fingerprints 399
Similarity Metrics 402
Search Strategy 404
Completed Virtual Screening Program 405
Benchmarking VS Performance 406
Scoring the Scorers 407
How to Score 407
Multiple Runs and Reproducibility 408
Adjusting the VS Program for Benchmarking 408
Analyzing Benchmark Results 410
Conclusion 414
Introduction to Chemoinformatics Toolkits 415
Theoretical Background 415
A Note on Graph Theory 416
Basic Usage: Creating and Manipulating Molecules in RDKit 417
Creation of Molecule Objects 417
Molecule Methods 418
Atom Methods 418
Bond Methods 419
An Example: Hill Notation for Molecules 419
Canonical SMILES: The Canon Algorithm 420
Theoretical Background 420
Recap of SMILES Notation 420
Canonical SMILES 421
Building a SMILES String 422
Canonicalization of SMILES 425
The Initial Invariant 427
The Iteration Step 428
Summary 431
Substructure Searching: The Ullmann Algorithm 432
Theoretical Background 432
Backtracking 433
A Note on Atom Order 436
The Ullmann Algorithm 436
Sample Runs 440
Summary 441
Atom Environment Fingerprints 441
Theoretical Background 441
Implementation 443
The Hashing Function 443
The Initial Atom Invariant 444
The Algorithm 444
Summary 447
References 447
Index 449
Erscheinungsdatum | 28.09.2017 |
---|---|
Verlagsort | New York |
Sprache | englisch |
Maße | 174 x 246 mm |
Gewicht | 1111 g |
Themenwelt | Mathematik / Informatik ► Informatik ► Theorie / Studium |
Naturwissenschaften ► Biologie | |
Naturwissenschaften ► Chemie ► Technische Chemie | |
Technik | |
ISBN-10 | 1-119-13796-9 / 1119137969 |
ISBN-13 | 978-1-119-13796-2 / 9781119137962 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich