Visual Data Mining
John Wiley & Sons Inc (Verlag)
978-1-119-96754-5 (ISBN)
A visual approach to data mining. Data mining has been defined as the search for useful and previously unknown patterns in large datasets, yet when faced with the task of mining a large dataset, it is not always obvious where to start and how to proceed.
This book introduces a visual methodology for data mining demonstrating the application of methodology along with a sequence of exercises using VisMiner. VisMiner has been developed by the author and provides a powerful visual data mining tool enabling the reader to see the data that they are working on and to visually evaluate the models created from the data.
Key features:
Presents visual support for all phases of data mining including dataset preparation.
Provides a comprehensive set of non-trivial datasets and problems with accompanying software.
Features 3-D visualizations of multi-dimensional datasets.
Gives support for spatial data analysis with GIS like features.
Describes data mining algorithms with guidance on when and how to use.
Accompanied by VisMiner, a visual software tool for data mining, developed specifically to bridge the gap between theory and practice.
Visual Data Mining: The VisMiner Approach is designed as a hands-on work book to introduce the methodologies to students in data mining, advanced statistics, and business intelligence courses. This book provides a set of tutorials, exercises, and case studies that support students in learning data mining processes.
In praise of the VisMiner approach:
"What we discovered among students was that the visualization concepts and tools brought the analysis alive in a way that was broadly understood and could be used to make sound decisions with greater certainty about the outcomes"
—Dr. James V. Hansen, J. Owen Cherrington Professor, Marriott School, Brigham Young University, USA
"Students learn best when they are able to visualize relationships between data and results during the data mining process. VisMiner is easy to learn and yet offers great visualization capabilities throughout the data mining process. My students liked it very much and so did I."
—Dr. Douglas Dean, Assoc. Professor of Information Systems, Marriott School, Brigham Young University, USA
Russell K. Anderson, Information & Decision Management Department, West Texas A&M University, USA.
Preface ix Acknowledgments xi
1. Introduction 1
Data Mining Objectives 1
Introduction to VisMiner 2
The Data Mining Process 3
Initial Data Exploration 4
Dataset Preparation 5
Algorithm Selection and Application 8
Model Evaluation 8
Summary 9
2. Initial Data Exploration and Dataset Preparation Using VisMiner 11
The Rationale for Visualizations 11
Tutorial – Using VisMiner 13
Initializing VisMiner 13
Initializing the Slave Computers 14
Opening a Dataset 16
Viewing Summary Statistics 16
Exercise 2.1 17
The Correlation Matrix 18
Exercise 2.2 20
The Histogram 21
The Scatter Plot 23
Exercise 2.3 28
The Parallel Coordinate Plot 28
Exercise 2.4 33
Extracting Sub-populations Using the Parallel Coordinate Plot 37
Exercise 2.5 41
The Table Viewer 42
The Boundary Data Viewer 43
Exercise 2.6 47
The Boundary Data Viewer with Temporal Data 47
Exercise 2.7 49
Summary 49
3. Advanced Topics in Initial Exploration and Dataset Preparation Using VisMiner 51
Missing Values 51
Missing Values – An Example 53
Exploration Using the Location Plot 56
Exercise 3.1 61
Dataset Preparation – Creating Computed Columns 61
Exercise 3.2 63
Aggregating Data for Observation Reduction 63
Exercise 3.3 65
Combining Datasets 66
Exercise 3.4 67
Outliers and Data Validation 68
Range Checks 69
Fixed Range Outliers 69
Distribution Based Outliers 70
Computed Checks 72
Exercise 3.5 74
Feasibility and Consistency Checks 74
Data Correction Outside of VisMiner 75
Distribution Consistency 76
Pattern Checks 77
A Pattern Check of Experimental Data 80
Exercise 3.6 81
Summary 82
4. Prediction Algorithms for Data Mining 83
Decision Trees 84
Stopping the Splitting Process 86
A Decision Tree Example 87
Using Decision Trees 89
Decision Tree Advantages 89
Limitations 90
Artificial Neural Networks 90
Overfitting the Model 93
Moving Beyond Local Optima 94
ANN Advantages and Limitations 96
Support Vector Machines 97
Data Transformations 99
Moving Beyond Two-dimensional Predictors 100
SVM Advantages and Limitations 100
Summary 101
5. Classification Models in VisMiner 103
Dataset Preparation 103
Tutorial – Building and Evaluating Classification Models 104
Model Evaluation 104
Exercise 5.1 109
Prediction Likelihoods 109
Classification Model Performance 113
Interpreting the ROC Curve 119
Classification Ensembles 124
Model Application 125
Summary 127
Exercise 5.2 128
Exercise 5.3 128
6. Regression Analysis 131
The Regression Model 131
Correlation and Causation 132
Algorithms for Regression Analysis 133
Assessing Regression Model Performance 133
Model Validity 135
Looking Beyond R2 135
Polynomial Regression 137
Artificial Neural Networks for Regression Analysis 137
Dataset Preparation 137
Tutorial 138
A Regression Model for Home Appraisal 139
Modeling with the Right Set of Observations 139
Exercise 6.1 145
ANN Modeling 145
The Advantage of ANN Regression 148
Top-Down Attribute Selection 149
Issues in Model Interpretation 150
Model Validation 152
Model Application 153
Summary 154
7. Cluster Analysis 155
Introduction 155
Algorithms for Cluster Analysis 158
Issues with K-Means Clustering Process 158
Hierarchical Clustering 159
Measures of Cluster and Clustering Quality 159
Silhouette Coefficient 161
Correlation Coefficient 161
Self-Organizing Maps (SOM) 161
Self-Organizing Maps in VisMiner 163
Choosing the Grid Dimensions 168
Advantages of a 3-D Grid 169
Extracting Subsets from a Clustering 170
Summary 173
Appendix A VisMiner Reference by Task 175
Appendix B VisMiner Task/Tool Matrix 187
Appendix C IP Address Look-up 189
Index 191
Erscheint lt. Verlag | 17.12.2012 |
---|---|
Verlagsort | New York |
Sprache | englisch |
Maße | 158 x 236 mm |
Gewicht | 472 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Office Programme ► Outlook | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
ISBN-10 | 1-119-96754-6 / 1119967546 |
ISBN-13 | 978-1-119-96754-5 / 9781119967545 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich