Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography.
Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R&D professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise. - Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects- Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods- Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization
Front cover 1
Data Mining: Practical Machine Learning Tools and Techniques 2
Copyright page 5
Table of contents 6
List of Figures 16
List of Tables 20
Preface 22
Updated and revised content 26
Acknowledgments 30
About the Authors 34
PART I: Introduction to Data Mining 36
Chapter 1: What’s It All About? 38
Data mining and machine learning 38
Simple examples: the weather and other problems 44
Fielded applications 56
Machine learning and statistics 63
Generalization as search 64
Data mining and ethics 68
Further reading 71
Chapter 2: Input: Concepts, Instances, and Attributes 74
What’s a concept? 75
What’s in an example? 77
What’s in an attribute? 84
Preparing the input 86
Further reading 95
Chapter 3: Output: Knowledge Representation 96
Tables 96
Linear models 97
Trees 99
Rules 102
Instance-based representation 113
Clusters 116
Further reading 118
Chapter 4: Algorithms: The Basic Methods 120
InFerring rudimentary rules 121
Statistical modeling 125
Divide-and-conquer: constructing decision trees 134
Covering algorithms: constructing rules 143
Mining association rules 151
Linear models 159
Instance-based learning 166
Clustering 173
Multi-instance learning 176
Further reading 178
Weka implementations 180
Chapter 5: Credibility: Evaluating What’s Been Learned 182
Training and testing 183
Predicting performance 185
Cross-validation 187
Other estimates 189
Comparing data mining schemes 191
Predicting probabilities 194
Counting the cost 198
Evaluating numeric prediction 215
Minimum description length principle 218
Applying the MDL principle to clustering 221
Further reading 222
Part 2: Advanced Data Mining 224
Chapter 6: Implementations: Real Machine Learning Schemes 226
Decision trees 227
Classification rules 238
Association rules 251
Extending linear models 258
Instance-based learning 279
Numeric prediction with local linear models 286
Bayesian networks 296
Clustering 308
Semisupervised learning 329
Multi-instance learning 333
Weka implementations 338
Chapter 7: Data Transformations 340
Attribute selection 342
Discretizing numeric attributes 349
Projections 357
Sampling 365
Cleansing 366
Transforming multiple classes to binary ones 373
Calibrating class probabilities 378
Further reading 381
Weka implementations 383
Chapter 8: Ensemble Learning 386
Combining multiple models 386
Bagging 387
Randomization 391
Boosting 393
Additive regression 397
Interpretable ensembles 400
Stacking 404
Further reading 406
Weka implementations 407
Chapter 9: Moving on: Applications and Beyond 410
Applying data mining 410
Learning from massive datasets 413
Data stream learning 415
Incorporating domain knowledge 419
Text mining 421
Web mining 424
Adversarial situations 428
Ubiquitous data mining 430
Further reading 432
PART III: The Weka Data Mining Workbench 436
Chapter 10: Introduction to Weka 438
What’s in weka? 438
How do you use it? 439
What else can you do? 440
How do you get it? 441
Chapter 11: The Explorer 442
Getting started 442
Exploring the explorer 451
Filtering algorithms 467
Learning algorithms 480
Metalearning algorithms 509
Clustering algorithms 515
Association-rule learners 520
Attribute selection 522
Chapter 12: The Knowledge Flow Interface 530
Getting started 530
Components 533
Configuring and connecting the components 535
Incremental learning 537
Chapter 13: The Experimenter 540
Getting started 540
Simple setup 545
Advanced setup 546
The analyze panel 547
Distributing processing over several machines 550
Chapter 14: The Command-Line Interface 554
Getting started 554
The structure of weka 554
Command-line options 561
Chapter 15: Embedded Machine Learning 566
A simple data mining application 566
Chapter 16: Writing New Learning Schemes 574
An example classifier 574
Conventions for implementing classifiers 590
Chapter 17: Tutorial Exercises for the Weka Explorer 594
Introduction to the explorer interface 594
Nearest-neighbor learning and decision trees 601
Classification boundaries 606
Preprocessing and parameter tuning 609
Document classification 613
Mining association rules 617
References 622
Index 642
Erscheint lt. Verlag | 3.2.2011 |
---|---|
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Betriebssysteme / Server |
Informatik ► Datenbanken ► Data Warehouse / Data Mining | |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen | |
ISBN-10 | 0-08-089036-9 / 0080890369 |
ISBN-13 | 978-0-08-089036-4 / 9780080890364 |
Haben Sie eine Frage zum Produkt? |
Größe: 27,6 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Größe: 10,2 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich