Informatics and Machine Learning
Wiley-Blackwell (Verlag)
978-1-119-71674-7 (ISBN)
»Informatics and Machine Learning: From Martingales to Metaheuristics« delivers an interdisciplinary presentation on how analyze any data captured in digital form. The book describes how readers can conduct analyses of text, general sequential data, experimental observations over time, stock market and econometric histories, or symbolic data, like genomes. It contains large amounts of sample code to demonstrate the concepts contained within and assist with various levels of project work.
The book offers a complete presentation of the mathematical underpinnings of a wide variety of forms of data analysis and provides extensive examples of programming implementations. It is based on two decades worth of the distinguished author's teaching and industry experience.
- A thorough introduction to probabilistic reasoning and bioinformatics, including Python shell scripting to obtain data counts, frequencies, probabilities, and anomalous statistics, or use with Bayes' rule
- An exploration of information entropy and statistical measures, including Shannon entropy, relative entropy, maximum entropy (maxent), and mutual information
- A practical discussion of ad hoc, ab initio, and bootstrap signal acquisition methods, with examples from genome analytics and signal analytics
Perfect for undergraduate and graduate students in machine learning and data analytics programs, »Informatics and Machine Learning: From Martingales to Metaheuristics« will also earn a place in the libraries of mathematicians, engineers, computer scientists, and life scientists with an interest in those subjects.
Stephen Winters-Hilt, PhD, is Sole Proprietor at Meta Logos Systems, Albuquerque, NM, USA, which specializes in Machine Learning, Signal Analysis, Financial Analytics, and Bioinformatics. He received his doctorate in Theoretical Physics from the University of Wisconsin, as well as a PhD in Computer Science and Bioinformatics from the University of California, Santa Cruz.
1 Introduction 1
1.1 Data Science: Statistics, Probability, Calculus ... Python (or Perl) and Linux 2
1.2 Informatics and Data Analytics 3
1.3 FSA-Based Signal Acquisition and Bioinformatics 4
1.4 Feature Extraction and Language Analytics 7
1.5 Feature Extraction and Gene Structure Identification 8
1.6 Theoretical Foundations for Learning 13
1.7 Classification and Clustering 13
1.8 Search 14
1.9 Stochastic Sequential Analysis (SSA) Protocol (Deep Learning Without NNs) 15
1.10 Deep Learning using Neural Nets 20
1.11 Mathematical Specifics and Computational Implementations 21
2 Probabilistic Reasoning and Bioinformatics 23
2.1 Python Shell Scripting 23
2.2 Counting, the Enumeration Problem, and Statistics 34
2.3 From Counts to Frequencies to Probabilities 35
2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics 35
2.5 Statistics, Conditional Probability, and Bayes' Rule 37
2.6 Emergent Distributions and Series 39
2.7 Exercises 42
3 Information Entropy and Statistical Measures 47
3.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information 48
3.2 Codon Discovery from Mutual Information Anomaly 58
3.3 ORF Discovery from Long-Tail Distribution Anomaly 66
3.4 Sequential Processes and Markov Models 72
3.5 Exercises 75
4 Ad Hoc, Ab Initio, and Bootstrap Signal Acquisition Methods 77
4.1 Signal Acquisition, or Scanning, at Linear Order Time-Complexity 77
4.2 Genome Analytics: The Gene-Finder 80
4.3 Objective Performance Evaluation: Sensitivity and Specificity 93
4.4 Signal Analytics: The Time-Domain Finite State Automaton (tFSA) 93
4.5 Signal Statistics (Fast): Mean, Variance, and Boxcar Filter 107
4.6 Signal Spectrum: Nyquist Criterion, Gabor Limit, Power Spectrum 110
4.7 Exercises 112
5 Text Analytics 125
5.1 Words 125
5.2 Phrases - Short (Three Words) 145
5.3 Phrases - Long (A Line or Sentence) 150
5.4 Exercises 153
6 Analysis of Sequential Data Using HMMs 155
6.1 Hidden Markov Models (HMMs) 155
6.2 Graphical Models for Markov Models and Hidden Markov Models 162
6.3 Standard HMM Weaknesses and their GHMM Fixes 168
6.4 Generalized HMMs (GHMMs - "Gems"): Minor Viterbi Variants 171
6.5 HMM Implementation for Viterbi (in C and Perl) 179
6.6 Exercises 206
7 Generalized HMMs (GHMMs): Major Viterbi Variants 207
7.1 GHMMs: Maximal Clique for Viterbi and Baum-Welch 207
7.2 GHMMs: Full Duration Model 216
7.3 GHMMs: Linear Memory Baum-Welch Algorithm 228
7.4 GHMMs: Distributable Viterbi and Baum-Welch Algorithms 230
7.5 Martingales and the Feasibility of Statistical Learning (further details in Appendix) 232
7.6 Exercises 234
8 Neuromanifolds and the Uniqueness of Relative Entropy 235
8.1 Overview 235
8.2 Review of Differential Geometry 236
8.3 Amari's Dually Flat Formulation 243
8.4 Neuromanifolds 247
8.5 Exercises 250
9 Neural Net Learning and Loss Bounds Analysis 253
9.1 Brief Introduction to Neural Nets (NNs) 254
9.2 Variational Learning Formalism and Use in Loss Bou ds Analysis 261
9.3 The "sinh 1( )" link algorithm (SA) 266
9.4 The Loss Bounds Analysis for sinh 1( ) 269
9.5 Exercises 277
10 Classification and Clustering 279
10.1 The SVM Classifier - An Overview 281
10.2 Introduction to Classification and Clustering 282
10.3 Lagrangian Optimization and Structural Risk Minimization (SRM) 296
10.4 SVM Binary Classifier Implementation 318
10.5 Kernel Selection and Tuning Metaheuristics 346
10.6 SVM Multiclass from Decision Tree with SVM Binary Classifiers 356
10.7 SVM Multiclass Classifier Derivation (Multiple Decision Surface) 359
10.8 SVM Clustering 364
10.9 Exercises 385
11 Search Metaheuristics 389
11.1 Trajectory-Based Search Metaheuristics 389
11.2 Population-Based Search Metaheuristics 399
11.3 Exercises 404
12 Stochastic Sequential Analysis (SSA) 407
12.1 HMM and FSA-Based Methods for Signal Acquisition and Feature Extraction 408
12.2 The Stochastic Sequential Analysis (SSA) Protocol 410
12.3 Channel Current Cheminformatics (CCC) Implementation of the Stochastic Sequential Analysis (SSA) Protocol 420
12.4 SCW for Detector Sensitivity Boosting 423
12.5 SSA for Deep Learning 430
12.6 Exercises 431
13 Deep Learning Tools - TensorFlow 433
13.1 Neural Nets Review 433
13.2 TensorFlow from Google 435
13.3 Exercises 444
14 Nanopore Detection - A Case Study 445
14.1 Standard Apparatus 447
14.2 Controlling Nanopore Noise Sources and Choice of Aperture 449
14.3 Length Resolution of Individual DNA Hairpins 451
14.4 Detection of Single Nucleotide Differences (Large Changes in Structure) 454
14.5 Blockade Mechanism for 9bphp 455
14.6 Conformational Kinetics on Model Biomolecules 459
14.7 Channel Current Cheminformatics 460
14.8 Channel-Based Detection Mechanisms 467
14.9 The NTD Nanoscope 474
14.10 NTD Biosensing Methods 495
14.10.1 Model Biosensor Based on Streptavidin and Biotin 495
14.10.2 Model System Based on DNA Annealing 501
14.10.3 Y-Aptamer with Use of Chaotropes to Improve Signal Resolution 506
14.10.4 Pathogen Detection, miRNA Detection, and miRNA Haplotyping 508
14.10.5 SNP Detection 510
14.10.6 Aptamer-Based Detection 512
14.10.7 Antibody-Based Detection 512
14.11 Exercises 516
Appendix A: Python and Perl System Programming in Linux 519
A.1 Getting Linux and Python in a Flash (Drive) 519
A.2 Linux and the Command Shell 520
A.3 Perl Review: I/O, Primitives, String Handling, Regex 521
Appendix B: Physics 529
B.1 The Calculus of Variations 529
Appendix C: Math 531
C.1 Martingales 531
C.2 Hoeffding Inequality 537
References 000
Erscheinungsdatum | 06.01.2022 |
---|---|
Verlagsort | Hoboken |
Sprache | englisch |
Maße | 163 x 243 mm |
Gewicht | 964 g |
Einbandart | gebunden |
Themenwelt | Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik |
Informatik ► Weitere Themen ► Bioinformatik | |
Mathematik / Informatik ► Mathematik | |
Naturwissenschaften ► Biologie | |
ISBN-10 | 1-119-71674-8 / 1119716748 |
ISBN-13 | 978-1-119-71674-7 / 9781119716747 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich