Data Analysis Using SQL and Excel
John Wiley & Sons Inc (Verlag)
978-1-119-02143-8 (ISBN)
A practical guide to data mining using SQL and Excel Data Analysis Using SQL and Excel, 2nd Edition shows you how to leverage the two most popular tools for data query and analysis—SQL and Excel—to perform sophisticated data analysis without the need for complex and expensive data mining tools. Written by a leading expert on business data mining, this book shows you how to extract useful business information from relational databases. You'll learn the fundamental techniques before moving into the "where" and "why" of each analysis, and then learn how to design and perform these analyses using SQL and Excel. Examples include SQL and Excel code, and the appendix shows how non-standard constructs are implemented in other major databases, including Oracle and IBM DB2/UDB. The companion website includes datasets and Excel spreadsheets, and the book provides hints, warnings, and technical asides to help you every step of the way.
Data Analysis Using SQL and Excel, 2nd Edition shows you how to perform a wide range of sophisticated analyses using these simple tools, sparing you the significant expense of proprietary data mining tools like SAS.
Understand core analytic techniques that work with SQL and Excel
Ensure your analytic approach gets you the results you need
Design and perform your analysis using SQL and Excel
Data Analysis Using SQL and Excel, 2nd Edition shows you how to best use the tools you already know to achieve expert results.
GORDON S. LINOFF has been working with databases for more decades than he cares to admit. He starting learning about SQL by memorizing the SQL 92 standard while leading a development team (at the now-defunct Thinking Machines Corporation) writing the first high-performance database focused on the complex queries needed for decision support. After that endeavor, Gordon co-founded Data Miners in 1998, a consulting practice devoted to data mining, analytics, and big data. A constant theme in his work is dataand often data in relational databases. His SQL skills have only gotten stronger over the years. In 2014, he was the top contributor to Stack Overflow, the leading question-and-answer-site for technical questions. His other books include the bestselling Data Mining Techniques, Third Edition; Mastering Data Mining; and Mining the Webwhich focus on data mining and analysis. This book follows on the popularity of the first edition, with a practical focus on how to actually get and interpret results.
Foreword xxxiii
Introduction xxxvii
Chapter 1 A Data Miner Looks at SQL 1
Databases, SQL, and Big Data 2
Picturing the Structure of the Data 6
Picturing Data Analysis Using Dataflows 16
SQL Queries 21
Subqueries and Common Table Expressions Are Our Friends 36
Lessons Learned 47
Chapter 2 What’s in a Table? Getting Started with Data Exploration 49
What Is Data Exploration? 50
Excel for Charting 51
Sparklines 65
What Values Are in the Columns? 68
More Values to Explore—Min, Max, and Mode 79
Exploring String Values 81
Exploring Values in Two Columns 86
From Summarizing One Column to Summarizing All Columns 90
Lessons Learned 96
Chapter 3 How Different Is Different? 97
Basic Statistical Concepts 98
How Different Are the Averages? 105
Sampling from a Table 110
Counting Possibilities 115
Ratios and Their Statistics 128
Chi-Square 132
What Months and Payment Types Have Unusual Affinities for Which Types of Products? 140
Lessons Learned 143
Chapter 4 Where Is It All Happening? Location, Location, Location 145
Latitude and Longitude 146
Census Demographics 160
Geographic Hierarchies 172
Mapping in Excel 188
Lessons Learned 194
Chapter 5 It’s a Matter of Time 197
Dates and Times in Databases 198
Starting to Investigate Dates 204
How Long Between Two Dates? 218
Year-over-Year Comparisons 229
Counting Active Customers by Day 239
Simple Chart Animation in Excel 247
Lessons Learned 254
Chapter 6 How Long Will Customers Last? Survival Analysis to Understand Customers and Their Value 255
Background on Survival Analysis 256
The Hazard Calculation 260
Survival and Retention 269
Comparing Different Groups of Customers 280
Comparing Survival over Time 287
Important Measures Derived from Survival 293
Using Survival for Customer Value Calculations 298
Forecasting 308
Lessons Learned 314
Chapter 7 Factors Affecting Survival: The What and Why of Customer Tenure 315
Which Factors Are Important and When 316
Left Truncation 328
Time Windowing 336
Competing Risks 342
Before and After 353
Lessons Learned 366
Chapter 8 Customer Purchases and Other Repeated Events 367
Identifying Customers 368
RFM Analysis 393
Which Households Are Increasing Purchase Amounts Over Time? 404
Time to Next Event 416
Lessons Learned 420
Chapter 9 What’s in a Shopping Cart? Market Basket Analysis 421
Exploring the Products 422
Products and Customer Worth 437
Product Geographic Distribution 448
Which Customers Have Particular Products? 451
Lessons Learned 463
Chapter 10 Association Rules and Beyond 465
Item Sets 466
The Simplest Association Rules 480
One-Way Association Rules 483
Two-Way Associations 489
Extending Association Rules 499
Lessons Learned 506
Chapter 11 Data Mining Models in SQL 507
Introduction to Directed Data Mining 508
Look-Alike Models 515
Lookup Model for Most Popular Product 522
Lookup Model for Order Size 528
Lookup Model for Probability of Response 534
Naive Bayesian Models (Evidence Models) 546
Lessons Learned 559
Chapter 12 The Best-Fit Line: Linear Regression Models 561
The Best-Fit Line 562
Measuring Goodness of Fit Using R2 581
Direct Calculation of Best-Fit Line Coefficients 584
Weighted Linear Regression 592
More Than One Input Variable 600
Lessons Learned 607
Chapter 13 Building Customer Signatures for Further Analysis 609
What Is a Customer Signature? 610
Designing Customer Signatures 617
Operations to Build Customer Signatures 622
Extracting Features 639
Summarizing Customer Behaviors 644
Lessons Learned 653
Chapter 14 Performance Is the Issue: Using SQL Effectively 655
Query Engines and Performance 656
Considerations When Thinking About Performance 660
Performance: Its Meaning and Measurement 663
Performance Improvement 101 665
Using Indexes Effectively 668
When OR Is a Bad Thing 683
Pros and Cons: Different Ways of Expressing the Same Thing 686
Window Functions 694
Lessons Learned 701
Appendix Equivalent Constructs Among Databases 703
Index 731
Erscheint lt. Verlag | 1.1.2016 |
---|---|
Verlagsort | New York |
Sprache | englisch |
Maße | 188 x 234 mm |
Gewicht | 1338 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Office Programme ► Excel | |
Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge | |
Informatik ► Weitere Themen ► Hardware | |
Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
ISBN-10 | 1-119-02143-X / 111902143X |
ISBN-13 | 978-1-119-02143-8 / 9781119021438 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich