Blick ins Buch

Unsupervised Learning Algorithms (eBook)

M. Emre Celebi, Kemal Aydin (Herausgeber)

eBook Download: PDF

2016 | 1st ed. 2016
X, 558 Seiten
Springer International Publishing (Verlag)
978-3-319-24211-8 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering, feature extraction, and applications of unsupervised learning. Each chapter is contributed by a leading expert in the field.

Preface 6
Contents 10
Anomaly Detection for Data with Spatial Attributes 12
1 Introduction 12
2 Problem Definition and Taxonomy of Techniques 15
2.1 Problem Definition 15
2.2 Taxonomy of Techniques 16
3 Object Anomalies: Techniques for Outlier Detection 18
3.1 General Outlier Detection 19
3.1.1 General Framework for Outlier Detection 19
3.1.2 LOF 20
3.1.3 Adapting for Spatial Outlier Detection 21
3.2 Spatial Outlier Detection 22
3.2.1 SLOM 22
4 Region Anomalies: Global 23
4.1 Statistical Approaches: Spatial Scan Statistics 23
4.1.1 ULS Scan Statistic 27
4.1.2 Other Extensions 29
4.2 Mining Approaches 29
4.2.1 Bump Hunting 29
5 Region Anomalies: Local 30
5.1 Localized Homogeneous Anomalies 30
5.2 Image Segmentation 32
6 Region Anomalies: Grouping 34
6.1 Clustering for Spatial Data 35
6.1.1 HAC-A 35
6.1.2 Clustering Ensuring Spatial Convexity 35
6.2 Clustering-Based Anomaly Detection 36
6.2.1 Adapting for Anomaly Detection on Spatial Data 38
7 Discussion 38
8 Directions for Future Work 39
9 Conclusions 41
References 42
Anomaly Ranking in a High Dimensional Space: The Unsupervised TreeRank Algorithm 44
1 Introduction 45
2 Anomaly Ranking: Background and Preliminaries 46
2.1 A Scoring Approach to Anomaly Ranking 46
2.2 Measuring Scoring Accuracy: The Mass-Volume Curve 47
3 Turning Anomaly Ranking into Bipartite Ranking 50
3.1 Bipartite Ranking and ROC Analysis 50
3.2 A Bipartite View of Anomaly Ranking 52
3.3 Extending Bipartite Methods via Uniform Sampling 53
4 The Unsupervised TreeRank Algorithm 54
4.1 Anomaly Ranking Trees 54
4.2 The Algorithm: Growing the Anomaly Ranking Tree 56
4.3 Pruning the Anomaly Ranking Tree: Model Selection 60
5 Numerical Experiments 61
6 Conclusion 64
References 64
Genetic Algorithms for Subset Selection in Model-Based Clustering 66
1 Introduction 66
2 Model-Based Clustering 67
2.1 Finite Mixture Modelling 67
2.2 BIC as a Criterion for Model Selection 68
3 Subset Selection in Model-Based Clustering 69
3.1 The Proposed Approach 70
3.2 Models for No Clustering 71
4 Genetic Algorithms 71
4.1 GAs for Subset Selection in Model-Based Clustering 72
4.1.1 Genetic Coding Scheme 72
4.1.2 Generation of a Population of Models 72
4.1.3 Fitness Function to Evaluate the Model Clustering 73
4.1.4 Genetic Operators 73
4.2 Computational Issues 73
4.3 Random-Key GAs to Select a Fixed Size Subset 74
5 Data Examples 74
5.1 Birds, Planes and Cars 75
5.2 Italian Wines 75
6 Conclusions 79
References 80
Clustering Evaluation in High-Dimensional Data 82
1 Introduction 82
2 Basic Notation 83
3 Problems in Analyzing High-Dimensional Data 84
3.1 Distance Concentration 85
3.2 Hubness: The Long Tail of Relevance and the Central Tendencies of Hubs 86
4 Clustering Techniques for High-Dimensional Data 89
5 Clustering Quality Indexes: An Overview 90
6 Clustering Quality Indexes: Existing Surveys 95
7 Clustering Evaluation in Many Dimensions 96
7.1 Experimental Protocol 97
7.2 Sensitivity to Increasing Dimensionality 98
7.2.1 Sensitivity of the Average Quality Assessment 98
7.2.2 Stability in Quality Assessment 108
7.3 Quantifying the Influence of Hubs 109
8 Perspectives and Future Directions 112
References 113
Combinatorial Optimization Approaches for Data Clustering 119
1 Introduction 119
2 Applications 120
3 Problem Definition and Distance Measures Definition 121
3.1 Euclidean Distance 122
3.2 Pearson's Correlation Coefficient 122
3.3 City-Block or Manhattan 123
3.4 Cosine or Uncentered Correlation 123
4 Mathematical Formulations of the Problem 123
4.1 Minimize (Maximize) the Within (Between)-Clusters Sum of Squares 124
4.1.1 Cardinality of Each Cluster A Priori Known 124
4.1.2 Bipartition of the Patterns 126
4.2 Optimizing the Within Clusters Distance 129
5 A Review of the Most Popular Clustering Techniques 131
5.1 Hierarchical Clustering Algorithms 132
5.2 Partitioning Clustering Algorithms 133
5.2.1 Squared Error Algorithms and the k-Means/k-Medoid Algorithms 133
5.2.2 Graph-Theoretic Algorithms 135
5.2.3 Mixture-Resolving Algorithms 136
5.3 Efficient Metaheuristic Approaches 136
5.4 Encoding 139
5.5 Decoding 140
6 Concluding Remarks 140
References 141
Kernel Spectral Clustering and Applications 145
1 Introduction 145
2 Notation 147
3 Kernel Spectral Clustering (KSC) 147
3.1 Mathematical Formulation 147
3.1.1 Training Problem 147
3.1.2 Generalization 149
3.1.3 Model Selection 150
3.2 Soft Kernel Spectral Clustering 151
3.3 Hierarchical Clustering 153
3.3.1 Approach 1 154
3.3.2 Approach 2 154
3.4 Sparse Clustering Models 155
3.4.1 Incomplete Cholesky Decomposition 155
3.4.2 Using Additional Penalty Terms 158
4 Applications 159
4.1 Image Segmentation 159
4.2 Scientific Journal Clustering 161
4.3 Power Load Clustering 165
4.4 Big Data 165
5 Conclusions 168
References 169
Uni- and Multi-Dimensional Clustering Via Bayesian Networks 172
1 Introduction 172
2 Uni-Dimensional Clustering 175
2.1 Known Structure 176
2.1.1 Naive Bayes 176
2.1.2 Expectation Model Averaging 178
2.1.3 Expectation Model Averaging: Tree Augmented Naive Bayes 179
2.2 Unknown Structure 180
2.2.1 Extended Naive Bayes 180
2.2.2 Recursive Bayesian Multinets 181
3 Multi-Dimensional Clustering 185
3.1 Latent Tree Models 185
3.1.1 Known Structure 186
3.1.2 Unknown Structure 186
3.2 Cluster Variables Novelty 189
4 Our Approach 191
5 Preliminary Results 196
6 Conclusion and Summary 198
References 199
A Radial Basis Function Neural Network Training Mechanism for Pattern Classification Tasks 202
1 Introduction 202
2 RBF Neural Network 203
3 Particle Swarm Optimization 204
4 RBF Network Training Algorithm 205
4.1 Extraction of the Multidimensional Fuzzy Subspaces 206
4.2 Estimation of the Network's Basis Function Parameters 208
4.3 Discriminant Analysis and PSO Implementation 209
5 Evaluation Experiments 210
5.1 WDBC Data Set 211
5.2 Wine Data Set 212
5.3 Pima Indians Diabetes Data Set 212
6 Conclusion 213
References 214
A Survey of Constrained Clustering 216
1 Introduction 216
2 Unsupervised Clustering 217
2.1 Minimum Sum-of-Squares Clustering 218
2.1.1 K-Means Algorithm 219
2.2 Agglomerative Hierarchical Clustering 220
2.3 COBWEB 220
3 Constrained Clustering 221
3.1 Constrained Clustering with Labeled Data 223
3.1.1 Search Based Methods 223
3.1.2 Distance Based Methods 224
3.2 Constrained Clustering with Instance-Level Constraints 225
3.2.1 Search Based Methods 225
3.2.2 Distance Based Methods 230
3.2.3 Search and Distance Based Methods 233
3.3 Constrained Clustering with Cluster-Level Constraints 235
3.4 Feasibility Issues 239
3.5 Related Studies 240
4 Conclusion 240
References 241
An Overview of the Use of Clustering for Data Privacy 245
1 Introduction 245
2 Clustering to Define Masking Methods 247
2.1 Clustering in Microaggregation 247
2.2 Clustering for Graphs: Microaggregation and k-Anonymity 249
2.3 Attacks on Microaggregation 251
2.4 Fuzzy Clustering for Microaggregation 252
2.5 Clustering for Masking Data Streams 253
2.6 Masking Very Large Data Sets 253
2.7 Masking Through Semantic Clustering 254
2.8 Clustering in Other Masking Methods 254
3 Clustering to Measure Information Loss 255
4 Conclusion 256
References 257
Nonlinear Clustering: Methods and Applications 260
1 Introduction 260
2 COLL for Kernel-Based Clustering 262
2.1 Problem Formulation 263
2.2 Batch Kernel k-Means and Issues 264
2.3 Conscience On-Line Learning 265
2.3.1 The COLL Model 265
2.3.2 The Computation of COLL 267
2.3.3 Computational Complexity 271
2.4 Experiments and Applications 272
3 Multi-exemplar Affinity Propagation 274
3.1 Affinity Propagation 275
3.2 Multi-exemplar Affinity Propagation 276
3.2.1 The Model 276
3.2.2 Optimization 278
3.3 Experiments and Applications 283
4 Graph-Based Multi-prototype Competitive Learning 284
4.1 Graph-Based Initial Clustering 284
4.2 Multi-prototype Competitive Learning 286
4.3 Fast GMPCL 288
4.3.1 Inner Product Based Computation 289
4.3.2 FGMPCL in High Dimension 291
4.4 Experiments and Applications 293
5 Position Regularized Support Vector Clustering 293
5.1 Background 295
5.1.1 Support Vector Domain Description 295
5.1.2 Support Vector Clustering 297
5.2 Position Regularized Support Vector Clustering 297
5.3 Experiments and Applications 302
6 Conclusion and Discussion 306
References 307
Swarm Intelligence-Based Clustering Algorithms: A Survey 310
1 Introduction 310
2 The Clustering Problem 313
3 Overview of the Swarm Intelligence-Based Approaches 315
3.1 Particle Swarm Optimization 315
3.2 Ant Colony Optimization 316
3.3 Ant-Based Sorting 318
3.4 Other Swarm Intelligence-Based Metaheuristics 319
4 Classification of the Swarm Intelligence-Based Algorithms for Clustering 319
4.1 Data Point-to-Cluster Assignment 319
4.2 Cluster Representatives 321
4.3 Direct Point-Agent Matching 325
4.4 Search Agent 327
5 Discussion 329
5.1 Agent Representation Versus SI-Based Clustering Algorithms 329
5.2 Agent Representation Versus Challenging Issues in Clustering 330
6 Conclusion 332
Appendix 333
References 345
Extending Kmeans-Type Algorithms by Integrating Intra-cluster Compactness and Inter-cluster Separation 349
1 Introduction 349
2 Related Work 351
2.1 No Weighting Kmeans-Type Algorithm 351
2.1.1 No Weighting Kmeans-Type Algorithm Without Inter-cluster Separation 351
2.1.2 No Weighting Kmeans-Type Algorithm with Inter-cluster Separation 353
2.2 Vector Weighting Kmeans-Type Algorithm 354
2.2.1 Vector Weighting Kmeans-Type Algorithm Without Inter-cluster Separation 354
2.2.2 Vector Weighting Kmeans-Type Algorithm with Inter-cluster Separation 355
2.3 Matrix Weighting Kmeans-Type Algorithm 355
2.3.1 Matrix Weighting Kmeans-Type Algorithm Without Inter-cluster Separation 355
2.3.2 Matrix Weighting Kmeans-Type Algorithm with Inter-cluster Separation 357
2.4 Summary of the Existing Kmeans-Type Algorithms 358
2.5 Characteristics of Our Extending Kmeans-Type Algorithms 359
3 The Extending Model of Kmeans-Type Algorithm 359
3.1 Motivation 359
3.2 Extension of Basic Kmeans (E-kmeans) 362
3.3 Extension of Wkmeans (E-Wkmeans) 364
3.4 Extension of AWA (E-AWA) 366
3.5 Relationship Among Algorithms 368
3.6 Computational Complexity 368
4 Experiments 369
4.1 Experimental Setup 369
4.2 Synthetic Data Set 370
4.2.1 Parametric Study 370
4.2.2 Results and Analysis 373
4.2.3 Feature Selection 374
4.2.4 Convergence Speed 376
4.3 Real-Life Data Set 377
4.3.1 Parametric Study 377
4.3.2 Results and Analysis 381
4.3.3 Convergence Speed 385
5 Discussion 386
6 Conclusion and Future Work 387
References 388
A Fuzzy-Soft Competitive Learning Approach for Grayscale Image Compression 391
1 Introduction 391
2 Related Work 393
2.1 The Batch Learning Vector Quantization 393
2.2 The Fuzzy Learning Vector Quantization Algorithm 395
3 The Proposed Vector Quantization Approach 397
3.1 Fuzzy-Set-Based Competitive Learning 398
3.2 Codeword Migration Process 401
4 Experimental Study 403
4.1 Study of the Behavior of the Distortion Measure and the PSNR 404
4.2 Computational Demands 406
4.3 Study of the Migration Strategy 406
4.4 Literature Comparison 407
5 Conclusions 408
References 409
Unsupervised Learning in Genome Informatics 411
1 Introduction 412
2 Unsupervised Learning for DNA 412
2.1 DNA Motif Discovery and Search 414
2.1.1 Representation (DNA Motif Model) 414
2.1.2 Learning (Motif Discovery) 417
2.1.3 Prediction (Motif Search) 419
2.2 Genome-Wide DNA-Binding Pattern Discovery 423
3 Unsupervised Learning for Inferring microRNA Regulatory Network 424
3.1 PicTar 426
3.2 A Probabilistic Approach to Explore Human miRNA Target Repertoire by Integrating miRNA-Overexpression Data and Sequence Information 430
3.2.1 Bayesian Mixture Model 431
3.2.2 Variational Bayesian Expectation Maximization 432
3.2.3 TargetScore 432
3.3 Network-Based Methods to Detect miRNA Regulatory Modules 433
3.4 GroupMiR: Inferring miRNA and mRNA Group Memberships with Indian Buffet Process 434
3.5 SNMNMF: Sparse Network-Regularized Multiple Nonnegative Matrix Factorization 440
3.6 Mirsynergy: Detecting Synergistic miRNA Regulatory Modules by Overlapping Neighborhood Expansion 442
3.6.1 Two-Stage Clustering 442
References 445
The Application of LSA to the Evaluation of Questionnaire Responses 455
1 Introduction 456
2 Essays for Evaluation 456
2.1 Open-Ended Responses Provide Unique Evaluation Leverage 457
2.2 The Problem with Essays: Human Raters Don't Scale 457
2.2.1 Expense 458
2.2.2 Language Dependencies 458
2.2.3 Consistency Issues 459
2.3 Automated Scoring Is Needed 460
2.3.1 Creating Automated Methods Requires Learning Systems 460
3 LSA as an Unsupervised Learning System 460
3.1 Brief History of LSA 461
3.2 Mathematical Background 462
3.2.1 Parsing: Turning Words into Numbers 462
3.2.2 Singular Value Decomposition 464
3.2.3 Query and Analysis Processing 466
3.3 LSA Learns Language 468
3.3.1 Unsupervised Learning 469
3.3.2 The LSA Model of Learning 470
3.3.3 Evidence of the Model 472
3.4 LSA Applications 475
4 Methodology 477
4.1 Objective 477
4.2 The Base Interpretive Space 477
4.2.1 Corpus Size 478
4.2.2 Relevant and Distributed Content 478
4.2.3 Term Coverage 479
4.3 Evaluation Algorithms 480
4.3.1 Target Based Scoring 480
4.3.2 Near Neighbor Scoring 481
4.3.3 Additive Analysis 481
4.4 Feedback Selection 482
5 Case Study NICHD Project 482
5.1 Background: Driver Training 482
5.1.1 Open-Ended Responses to Scenario Prompts 483
5.1.2 Provide Feedback Suggestions for Improvement 483
5.2 Construction of the Background Space 484
5.3 Establish Target and Feedback Items 484
5.3.1 Human Input: The SME 485
5.3.2 Human Selected Feedback Items 485
5.4 Feedback Selection Method 486
5.5 Results 487
6 Conclusion 487
References 488
Mining Evolving Patterns in Dynamic Relational Networks 491
1 Introduction 491
2 Definitions and Notation 493
3 Mining the Evolution of Conserved Relational States 496
3.1 Evolving Induced Relational State 496
3.2 Finding Evolving Induced Relational States 499
3.2.1 Step 1: Mining of Induced Relational States 499
3.2.2 Step 2: Mining of Maximal Evolution Paths 506
4 Mining the Coevolving Relational Motifs 508
4.1 Coevolving Relational Motifs 508
4.1.1 CRM Embedding/Occurrence 509
4.1.2 CRM Constraints 509
4.2 Finding Coevolving Relational Motifs 511
4.2.1 CRM Representation 511
4.2.2 Mining Anchors 513
4.2.3 CRM Enumeration 513
4.2.4 CRMminer Algorithm 519
4.2.5 Algorithm Completeness 519
4.2.6 Search Space Pruning 520
5 Mining the Coevolving Induced Relational Motifs 524
5.1 Coevolving Induced Relational Motifs 524
5.2 Coevolving Induced Relational Motif Mining 526
5.2.1 Mining Anchors 526
5.2.2 CIRM Enumeration 527
5.2.3 CIRMminer Algorithm 530
6 Qualitative Analysis and Applications 531
6.1 Analysis of a Trade Network 531
6.2 Analysis of a Co-authorship Network 533
6.3 Analysis of a Multivariate Time-Series Dataset 534
7 Conclusion and Future Research Directions 534
References 537
Probabilistically Grounded Unsupervised Training of Neural Networks 539
1 Introduction 539
2 Unsupervised Estimation of Probability Density Functions 540
2.1 Estimating pdfs via Constrained RBFs 541
2.2 Estimating pdfs via Multilayer Perceptrons 544
3 From pdf Estimation to Online Neural Clustering 549
4 Maximum-Likelihood Modeling of Sequences 554
4.1 Motivation: Beyond Hidden Markov Models and Recurrent ANNs 554
4.2 Unsupervised ML Learning in Generative ANN/HMM Hybrids 556
5 Conclusions 561
References 562

Erscheint lt. Verlag	29.4.2016
Zusatzinfo	X, 558 p. 160 illus., 101 illus. in color.
Verlagsort	Cham
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
	Technik ► Elektrotechnik / Energietechnik
Schlagworte	Big Data Patterns • data analytics • Data Mining • Elements Statistical Learning • Genomic Data Sets • machine learning • pattern recognition • statistical learning theory • Unsupervised Algorithms • Unsupervised Learning
ISBN-10	3-319-24211-3 / 3319242113
ISBN-13	978-3-319-24211-8 / 9783319242118

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 14,5 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

CHF 164,75