Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Automating the Design of Data Mining Algorithms (eBook)

An Evolutionary Computation Approach
eBook Download: PDF
2009
XIII, 187 Seiten
Springer Berlin (Verlag)
978-3-642-02541-9 (ISBN)

Lese- und Medienproben

Automating the Design of Data Mining Algorithms - Gisele L. Pappa, Alex Freitas
Systemvoraussetzungen
96,29 inkl. MwSt
(CHF 93,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Data mining is a very active research area with many successful real-world app- cations. It consists of a set of concepts and methods used to extract interesting or useful knowledge (or patterns) from real-world datasets, providing valuable support for decision making in industry, business, government, and science. Although there are already many types of data mining algorithms available in the literature, it is still dif cult for users to choose the best possible data mining algorithm for their particular data mining problem. In addition, data mining al- rithms have been manually designed; therefore they incorporate human biases and preferences. This book proposes a new approach to the design of data mining algorithms. - stead of relying on the slow and ad hoc process of manual algorithm design, this book proposes systematically automating the design of data mining algorithms with an evolutionary computation approach. More precisely, we propose a genetic p- gramming system (a type of evolutionary computation method that evolves c- puter programs) to automate the design of rule induction algorithms, a type of cl- si cation method that discovers a set of classi cation rules from data. We focus on genetic programming in this book because it is the paradigmatic type of machine learning method for automating the generation of programs and because it has the advantage of performing a global search in the space of candidate solutions (data mining algorithms in our case), but in principle other types of search methods for this task could be investigated in the future.

Preface 6
Contents 8
Acronyms 12
Introduction 13
Rule Induction Algorithms 14
Evolutionary Computation 16
Genetic Programming 16
The Motivation for Automating the Design of Classification Algorithms 19
The Problem of the Selective Superiority of Classification Algorithms 19
Human Biases in Manually Designed Algorithms 22
A New Level of Automation in Data Mining 23
Overview of the Proposed Genetic Programming System 24
References 27
Data Mining 29
Introduction 29
The Classification Task of Data Mining 30
On Predictive Accuracy 31
On Overfitting and Underfitting 34
On the Comprehensibility of Discovered Knowledge 35
Decision Tree Induction 37
Rule Induction via the Sequential Covering Approach 39
Representation of the Candidate Rules 42
Search Mechanism 44
Rule Evaluation 46
Rule Pruning Methods 49
Meta-learning 51
Meta-learning for Classification Algorithm Selection 51
Stacked Generalization: Meta-learning via a Combination of Base Learners' Predictions 54
Summary 54
References 55
Evolutionary Algorithms 59
Introduction 59
An Overview of Evolutionary Algorithms 60
Individual Representation 60
Fitness Function 61
Individual Selection 62
Genetic Operators 62
Multiobjective Optimization 64
The Pareto Optimality Concept 65
Lexicographic Multiobjective Optimization 66
Genetic Programming Versus Genetic Algorithms: A Critical Perspective 67
Genetic Programming 71
Terminal and Function Sets and the Closure Property 74
Fitness Function: An Example Involving Regression 76
Selection and Genetic Operators 77
Approaches for Satisfying the Closure Property 80
Bloat 80
Grammar-Based Genetic Programming 82
Grammars 84
GGP with Solution-Encoding Individual 86
GGP with Production-Rule-Sequence-Encoding Individual 89
Summary 92
References 92
Genetic Programming for Classification and Algorithm Design 97
Introduction 97
Classification Models Versus Classification Algorithms 98
Genetic Programming for Evolving Classification Models 100
Evolving Classification Functions or Classification Rules 101
Evolving Decision Trees 103
Genetic Programming for Evolving Components of Rule Induction Algorithms 104
Genetic Programming for Evolving Classification Systems 107
Evolving the Design of Optimization Algorithms 109
Optimization Versus Classification 109
On Meta-heuristics and Hyper-heuristics 112
Evolving the Core Heuristic of Optimization Algorithms 113
Evolving an Evolutionary Algorithm for Optimization 116
Summary 117
References 118
Automating the Design of Rule Induction Algorithms 121
Introduction 121
The Grammar: Specifying the Building Blocks of Rule Induction Algorithms 123
The New Rule Induction Algorithmic Components in the Grammar 128
Individual Representation 129
Population Initialization 130
Individual Evaluation 133
From a Derivation Tree to Java Code 136
Single-Objective Fitness 138
Multiobjective Fitness 141
Crossover and Mutation Operations 143
Summary 145
References 145
Computational Results on the Automatic Design of Full Rule Induction Algorithms 148
Introduction 148
Evolving Rule Induction Algorithms Robust Across Different Application Domains 149
Investigating the GGP System's Sensitivity to Parameters 150
Comparing GGP-Designed Rule Induction Algorithms with Human-Designed Rule Induction Algorithms 153
To What Extent Are GGP-RIs Different from Manually Designed Rule Induction Algorithms? 155
Meta-training Set Variations 159
GGP System's Grammar Variations 162
GGP Versus Grammar-Based Hill-Climbing Search 164
MOGGP: A Multiobjective Version of the Proposed GGP 167
A Note on the GGP System's Execution Time 171
Evolving Rule Induction Algorithms Tailored to the Target Application Domain 172
Experiments with Public UCI Datasets 173
GGP-RIs Versus GHC-RIs 177
Experiments with Bioinformatics Datasets 178
A Note on the GGP System's Execution Time 183
Summary 184
References 185
Directions for Future Research on the Automatic Design of Data Mining Algorithms 187
Potential Improvements to the Current GGP System 188
Improving the Grammar 188
Modifying the GGP System's Fitness Function 189
Designing Rule Induction Algorithms Tailored to a Type of Dataset 190
Investigating Other Types of Search Methods for Automated Algorithm Design 191
Automatically Designing Other Types of Classification Algorithms 192
Automatically Designing Other Types of Data Mining Algorithms 193
References 194
Index 195

Erscheint lt. Verlag 27.10.2009
Reihe/Serie Natural Computing Series
Natural Computing Series
Zusatzinfo XIII, 187 p. 33 illus.
Verlagsort Berlin
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Datenbanken
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Schlagworte algorithms • classification • Data Mining • data structures • evolutionary algorithms • Evolutionary Computing • genetic programming • machine learning • Rule Induction • tar
ISBN-10 3-642-02541-2 / 3642025412
ISBN-13 978-3-642-02541-9 / 9783642025419
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 2,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich