Visual Data Mining

The VisMiner Approach

Russell K. Anderson (Autor)

Buch | Hardcover

208 Seiten

2012
John Wiley & Sons Inc (Verlag)
978-1-119-96754-5 (ISBN)

Artikel merken

This book introduces a visual methodology for data mining demonstrating the application of methodology along with a sequence of exercises using VisMiner. VisMiner has been developed by the author and provides a powerful visual data mining tool enabling readers to visually evaluate models created from the data.

A visual approach to data mining. Data mining has been defined as the search for useful and previously unknown patterns in large datasets, yet when faced with the task of mining a large dataset, it is not always obvious where to start and how to proceed.

This book introduces a visual methodology for data mining demonstrating the application of methodology along with a sequence of exercises using VisMiner. VisMiner has been developed by the author and provides a powerful visual data mining tool enabling the reader to see the data that they are working on and to visually evaluate the models created from the data.

Key features:

Presents visual support for all phases of data mining including dataset preparation.
Provides a comprehensive set of non-trivial datasets and problems with accompanying software.
Features 3-D visualizations of multi-dimensional datasets.
Gives support for spatial data analysis with GIS like features.
Describes data mining algorithms with guidance on when and how to use.
Accompanied by VisMiner, a visual software tool for data mining, developed specifically to bridge the gap between theory and practice.

Visual Data Mining: The VisMiner Approach is designed as a hands-on work book to introduce the methodologies to students in data mining, advanced statistics, and business intelligence courses. This book provides a set of tutorials, exercises, and case studies that support students in learning data mining processes.

In praise of the VisMiner approach:

"What we discovered among students was that the visualization concepts and tools brought the analysis alive in a way that was broadly understood and could be used to make sound decisions with greater certainty about the outcomes"
—Dr. James V. Hansen, J. Owen Cherrington Professor, Marriott School, Brigham Young University, USA

"Students learn best when they are able to visualize relationships between data and results during the data mining process. VisMiner is easy to learn and yet offers great visualization capabilities throughout the data mining process. My students liked it very much and so did I."
—Dr. Douglas Dean, Assoc. Professor of Information Systems, Marriott School, Brigham Young University, USA

Russell K. Anderson, Information & Decision Management Department, West Texas A&M University, USA.

Preface ix Acknowledgments xi

1. Introduction 1

Data Mining Objectives 1

Introduction to VisMiner 2

The Data Mining Process 3

Initial Data Exploration 4

Dataset Preparation 5

Algorithm Selection and Application 8

Model Evaluation 8

Summary 9

2. Initial Data Exploration and Dataset Preparation Using VisMiner 11

The Rationale for Visualizations 11

Tutorial – Using VisMiner 13

Initializing VisMiner 13

Initializing the Slave Computers 14

Opening a Dataset 16

Viewing Summary Statistics 16

Exercise 2.1 17

The Correlation Matrix 18

Exercise 2.2 20

The Histogram 21

The Scatter Plot 23

Exercise 2.3 28

The Parallel Coordinate Plot 28

Exercise 2.4 33

Extracting Sub-populations Using the Parallel Coordinate Plot 37

Exercise 2.5 41

The Table Viewer 42

The Boundary Data Viewer 43

Exercise 2.6 47

The Boundary Data Viewer with Temporal Data 47

Exercise 2.7 49

Summary 49

3. Advanced Topics in Initial Exploration and Dataset Preparation Using VisMiner 51

Missing Values 51

Missing Values – An Example 53

Exploration Using the Location Plot 56

Exercise 3.1 61

Dataset Preparation – Creating Computed Columns 61

Exercise 3.2 63

Aggregating Data for Observation Reduction 63

Exercise 3.3 65

Combining Datasets 66

Exercise 3.4 67

Outliers and Data Validation 68

Range Checks 69

Fixed Range Outliers 69

Distribution Based Outliers 70

Computed Checks 72

Exercise 3.5 74

Feasibility and Consistency Checks 74

Data Correction Outside of VisMiner 75

Distribution Consistency 76

Pattern Checks 77

A Pattern Check of Experimental Data 80

Exercise 3.6 81

Summary 82

4. Prediction Algorithms for Data Mining 83

Decision Trees 84

Stopping the Splitting Process 86

A Decision Tree Example 87

Using Decision Trees 89

Decision Tree Advantages 89

Limitations 90

Artificial Neural Networks 90

Overfitting the Model 93

Moving Beyond Local Optima 94

ANN Advantages and Limitations 96

Support Vector Machines 97

Data Transformations 99

Moving Beyond Two-dimensional Predictors 100

SVM Advantages and Limitations 100

Summary 101

5. Classification Models in VisMiner 103

Dataset Preparation 103

Tutorial – Building and Evaluating Classification Models 104

Model Evaluation 104

Exercise 5.1 109

Prediction Likelihoods 109

Classification Model Performance 113

Interpreting the ROC Curve 119

Classification Ensembles 124

Model Application 125

Summary 127

Exercise 5.2 128

Exercise 5.3 128

6. Regression Analysis 131

The Regression Model 131

Correlation and Causation 132

Algorithms for Regression Analysis 133

Assessing Regression Model Performance 133

Model Validity 135

Looking Beyond R2 135

Polynomial Regression 137

Artificial Neural Networks for Regression Analysis 137

Dataset Preparation 137

Tutorial 138

A Regression Model for Home Appraisal 139

Modeling with the Right Set of Observations 139

Exercise 6.1 145

ANN Modeling 145

The Advantage of ANN Regression 148

Top-Down Attribute Selection 149

Issues in Model Interpretation 150

Model Validation 152

Model Application 153

Summary 154

7. Cluster Analysis 155

Introduction 155

Algorithms for Cluster Analysis 158

Issues with K-Means Clustering Process 158

Hierarchical Clustering 159

Measures of Cluster and Clustering Quality 159

Silhouette Coefficient 161

Correlation Coefficient 161

Self-Organizing Maps (SOM) 161

Self-Organizing Maps in VisMiner 163

Choosing the Grid Dimensions 168

Advantages of a 3-D Grid 169

Extracting Subsets from a Clustering 170

Summary 173

Appendix A VisMiner Reference by Task 175

Appendix B VisMiner Task/Tool Matrix 187

Appendix C IP Address Look-up 189

Index 191

Erscheint lt. Verlag	17.12.2012
Verlagsort	New York
Sprache	englisch
Maße	158 x 236 mm
Gewicht	472 g
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Informatik ► Office Programme ► Outlook
	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
ISBN-10	1-119-96754-6 / 1119967546
ISBN-13	978-1-119-96754-5 / 9781119967545
Zustand	Neuware