The Microsoft Data Warehouse Toolkit
John Wiley & Sons Inc (Verlag)
978-0-470-64038-8 (ISBN)
Best practices and invaluable advice from world-renowned data warehouse experts In this book, leading data warehouse experts from the Kimball Group share best practices for using the upcoming “Business Intelligence release” of SQL Server, referred to as SQL Server 2008 R2. In this new edition, the authors explain how SQL Server 2008 R2 provides a collection of powerful new tools that extend the power of its BI toolset to Excel and SharePoint users and they show how to use SQL Server to build a successful data warehouse that supports the business intelligence requirements that are common to most organizations. Covering the complete suite of data warehousing and BI tools that are part of SQL Server 2008 R2, as well as Microsoft Office, the authors walk you through a full project lifecycle, including design, development, deployment and maintenance.
Features more than 50 percent new and revised material that covers the rich new feature set of the SQL Server 2008 R2 release, as well as the Office 2010 release
Includes brand new content that focuses on PowerPivot for Excel and SharePoint, Master Data Services, and discusses updated capabilities of SQL Server Analysis, Integration, and Reporting Services
Shares detailed case examples that clearly illustrate how to best apply the techniques described in the book
The accompanying Web site contains all code samples as well as the sample database used throughout the case studies
The Microsoft Data Warehouse Toolkit, Second Edition provides you with the knowledge of how and when to use BI tools such as Analysis Services and Integration Services to accomplish your most essential data warehousing tasks.
Joy Mundy is a member of the Kimball Group and has been focusing on data warehousing and business intelligence since the early 1990s. Warren Thornthwaite has been building decision support and data warehousing systems since 1980 and is a member of the Kimball Group. Ralph Kimball, PhD, is known worldwide as an innovator, writer, educator, speaker, and consultant in the field of data warehousing. He is the founder of the Kimball Group (www.kimballgroup.com), which provides data warehouse consulting and education.
Foreword xxvii
Introduction xxix
Part 1 Requirements, Realities, and Architecture 1
Chapter 1 Defining Business Requirements 3
The Most Important Determinant of Long-Term Success 5
Adventure Works Cycles Introduction 6
Uncovering Business Value 6
Obtaining Sponsorship 7
Defining Enterprise-Level Business Requirements 8
Prioritizing the Business Requirements 22
Revisiting the Project Planning 25
Gathering Project-Level Requirements 26
Summary 28
Chapter 2 Designing the Business Process Dimensional Model 29
Dimensional Modeling Concepts and Terminology 30
Facts 31
Dimensions 33
Bringing Facts and Dimensions Together 34
The Bus Matrix, Conformed Dimensions, and Drill Across 36
Additional Design Concepts and Techniques 38
Surrogate Keys 38
Slowly Changing Dimensions 39
Dates 42
Degenerate Dimensions 43
Snowflaking 43
Many-to-Many or Multivalued Dimensions 44
Hierarchies 47
Aggregate Dimensions 49
Junk Dimensions 51
The Three Fact Table Types 52
Aggregates 53
The Dimensional Modeling Process 54
Preparation 55
Data Profiling and Research 60
Building Dimensional Models 63
Developing the Detailed Dimensional Model 66
Testing and Refining the Model 68
Reviewing and Validating the Model 68
Case Study: The Adventure Works Cycles Orders Dimensional Model 69
The Orders Fact Table 69
The Dimensions 69
Identifying Dimension Attributes and Facts for the Orders Business Process 72
The Final Draft of the Initial Orders Model 74
Detailed Orders Dimensional Model Development 75
Final Dimensional Model 77
Summary 77
Chapter 3 The Toolset 79
The Microsoft DW/BI Toolset 80
Why Use the Microsoft Toolset? 82
Architecture of a Microsoft DW/BI System 83
Why Analysis Services? 84
Why a Relational Store? 86
ETL Is Not Optional 86
The Role of Master Data Services 88
Delivering BI Applications 88
Overview of the Microsoft Tools 89
Which Products Do You Need? 90
SQL Server Development and Management Tools 92
Summary 97
Chapter 4 System Setup 99
System Sizing Considerations 100
Calculating Data Volumes 101
Determining Usage Complexity 102
Estimating Simultaneous Users 104
Assessing System Availability Requirements 105
How Big Will It Be? 105
System Configuration Considerations 105
Memory 106
Monolithic or Distributed? 106
Storage System Considerations 110
Processors 113
Setting Up for High Availability 114
Software Installation and Configuration 115
Development Environment Software Requirements 116
Test and Production Software Requirements 120
Operating Systems 122
SQL Server Relational Database Setup 122
Analysis Services Setup 126
Integration Services Setup 129
Reporting Services Setup 130
Summary 131
Part 2 Building and Populating the Databases 133
Chapter 5 Creating the Relational Data Warehouse 135
Getting Started 136
Complete the Physical Design 137
Surrogate Keys 138
String Columns 138
To Null, or Not to Null? 140
Housekeeping Columns 140
Table and Column Extended Properties 142
Define Storage and Create Constraints and Supporting Objects 142
Create Files and Filegroups 142
Data Compression 144
Entity and Referential Integrity Constraints 145
Initial Indexing and Database Statistics 147
Aggregate Tables 150
Create Table Views 151
Insert an Unknown Member Row 152
Example CREATE TABLE Statement 152
Partitioned Tables 153
Finishing Up 163
Staging Tables 163
Metadata Setup 163
Summary 164
Chapter 6 Master Data Management 165
Managing Master Reference Data 166
Incomplete Attributes 167
Data Integration 168
Systems Integration 170
Master Data Management Systems and the Data Warehouse 171
Introducing SQL Server Master Data Services 171
Model Definition Features 172
Data Management Features 174
User Interface: Exploring and Managing the Master Data 174
Importing and Updating Data 176
Exporting Data 177
Full Versioning of All Attributes 179
Creating a Simple Application 179
Summary 186
Chapter 7 Designing and Developing the ETL System 187
Round Up the Requirements 188
Develop the ETL Plan 191
Introducing SQL Server Integration Services 192
Control Flow and Data Flow 194
SSIS Package Architecture 197
The Major Subsystems of ETL 198
Extracting Data 199
Subsystem 1: Data Profiling 199
Subsystem 2: Change Data Capture System 200
Subsystem 3: Extract System 202
Cleaning and Conforming Data 206
Subsystem 4: Data Cleaning System 206
Subsystem 5: Error Event Schema 214
Subsystem 6: Audit Dimension Assembler 215
Subsystem 7: Deduplication System 216
Subsystem 8: Conforming System 217
Delivering Data for Presentation 218
Subsystem 9: Slowly Changing Dimension Manager 218
Subsystem 10: Surrogate Key Generator 223
Subsystem 11: Hierarchy Manager 223
Subsystem 12: Special Dimensions Manager 224
Subsystem 13: Fact Table Builders 225
Subsystem 14: Surrogate Key Pipeline 229
Subsystem 15: Multi-Valued Dimension Bridge Table Builder 235
Subsystem 16: Late Arriving Data Handler 235
Subsystem 17: Dimension Manager 238
Subsystem 18: Fact Provider System 238
Subsystem 19: Aggregate Builder 239
Subsystem 20: OLAP Cube Builder 239
Subsystem 21: Data Propagation Manager 240
Managing the ETL Environment 240
Summary 243
Chapter 8 The Core Analysis Services OLAP Database 245
Overview of Analysis Services OLAP 247
Why Use Analysis Services? 247
Why Not Analysis Services? 249
Designing the OLAP Structure 250
Planning 251
Getting Started 253
Create a Project and a Data Source View 255
Dimension Designs 257
Creating and Editing Dimensions 261
Creating and Editing the Cube 274
Physical Design Considerations 291
Understanding Storage Modes 293
Developing the Partitioning Plan 294
Designing Performance Aggregations 296
Planning for Deployment 298
Processing the Full Cube 299
Developing the Incremental Processing Plan 299
Summary 304
Chapter 9 Design Requirements for Real-Time BI 305
Real-Time Triage 306
What Does Real-Time Mean? 306
Who Needs Real Time? 307
Real-Time Tradeoffs 308
Scenarios and Solutions 311
Executing Reports in Real Time 313
Serving Reports from a Cache 313
Creating an ODS with Mirrors and Snapshots 314
Creating an ODS with Replication 314
Building a BizTalk Application 315
Building a Real-Time Relational Partition 315
Querying Real-Time Data in the Relational Database 317
Using Analysis Services to Query Real-Time Data 318
Summary 319
Part 3 Developing the BI Applications 321
Chapter 10 Building BI Applications in Reporting Services 323
A Brief Overview of BI Applications 324
Types of BI Applications 325
The Value of Business Intelligence Applications 326
A High-Level Architecture for Reporting 328
Reviewing Business Requirements for Reporting 328
Examining the Reporting Services Architecture 330
Using Reporting Services as a Standard Reporting Tool 332
Reporting Services Assessment 339
The Reporting System Design and Development Process 340
Reporting System Design 341
Reporting System Development 348
Building and Delivering Reports 351
Planning and Preparation 351
Creating Reports 354
Reporting Operations 368
Ad Hoc Reporting Options 369
The Report Model 370
Shared Datasets 371
Report Parts 371
Summary 372
Chapter 11 PowerPivot and Excel 375
Using Excel for Analysis and Reporting 376
The PowerPivot Architecture: Excel on Steroids 378
Creating and Using PowerPivot Databases 380
Getting Started 381
PowerPivot Table Design 381
Creating Analytics with PowerPivot 385
Observations and Guidelines on PowerPivot for Excel 392
PowerPivot for SharePoint 394
The PowerPivot SharePoint User Experience 394
Server-Level Resources 397
PowerPivot Monitoring and Management 397
PowerPivot’s Role in a Managed DW/BI Environment 400
Summary 401
Chapter 12 The BI Portal and SharePoint 403
The BI Portal 404
Planning the BI Portal 405
Impact on Design 406
Business Process Categories 407
Additional Functions 408
Building the BI Portal 409
Using SharePoint as the BI Portal 411
Architecture and Concepts 412
Setting Up SharePoint 417
Summary 426
Chapter 13 Incorporating Data Mining 429
Defining Data Mining 430
Basic Data Mining Terminology 432
Business Uses of Data Mining 433
Roles and Responsibilities 440
SQL Server Data Mining Architecture Overview 440
The Data Mining Design Environment 442
Build, Deploy, and Process 442
Accessing the Mining Models 443
Integration Services and Data Mining 443
Additional Features 444
Architecture Summary 445
Microsoft Data Mining Algorithms 445
Decision Trees 446
Naïve Bayes 447
Clustering 448
Sequence Clustering 448
Time Series 449
Association 449
Neural Network 449
The Data Mining Process 450
The Business Phase 451
The Data Mining Phase 453
The Operations Phase 460
Metadata 462
Data Mining Examples 463
Case Study: Categorizing Cities 463
Case Study: Product Recommendations 472
Summary 488
Part 4 Deploying and Managing the DW/BI System 491
Chapter 14 Designing and Implementing Security 493
Identifying the Security Manager 494
Securing the Hardware and Operating System 495
Securing the Operating System 495
Using Windows Integrated Security 496
Securing the Development Environment 497
Securing the Data 498
Providing Open Access for Internal Users 498
Itemizing Sensitive Data 500
Securing Various Types of Data Access 500
Securing the Components of the DW/BI System 502
Reporting Services Security 502
Analysis Services Security 505
Relational DW Security 514
Integration Services Security 520
Usage Monitoring 521
Summary 521
Chapter 15 Metadata Plan 523
Metadata Basics 524
The Purpose of Metadata 524
Metadata Categories 525
The Metadata Repository 526
Metadata Standards 526
SQL Server 2008 R2 Metadata 527
Cross-Tool Components 528
Relational Engine Metadata 532
Analysis Services 532
Integration Services 533
Reporting Services 533
Master Data Services 534
SharePoint 534
External Metadata Sources 534
Looking to the Future 535
A Practical Metadata Approach 535
Creating the Metadata Strategy 536
Business Metadata Reporting 538
Process Metadata Reporting 541
Technical Metadata Reporting 542
Ongoing Metadata Management 543
Summary 543
Chapter 16 Deployment 545
Setting Up the Environments 546
Testing 550
Development Testing 551
System Testing 555
Data Quality Assurance Testing 557
Performance Testing 559
Usability Testing 562
Testing Summary 563
Deploying to Production 564
Relational Database Deployment 565
Integration Services Package Deployment 567
Analysis Services Database Deployment 568
Reporting Services Report Deployment 571
Master Data Services Deployment 572
Data Warehouse and BI Documentation 573
Core Descriptions 573
Additional Documentation 575
User Training 576
User Support 579
Desktop Readiness and Configuration 580
Summary 581
Chapter 17 Operations and Maintenance 583
Providing User Support 584
Maintaining the BI Portal 585
Extending the BI Applications 586
System Management 587
Governing the DW/BI System 588
Performance Monitoring 593
Usage Monitoring 600
Managing Disk Space 602
Service and Availability Management 603
Performance Tuning the DW/BI System 604
Backup and Recovery 606
Executing the ETL Packages 611
Summary 611
Chapter 18 Present Imperatives and Future Outlook 613
Growing the DW/BI System 613
Lifecycle Review with Common Problems 615
Phase I — Requirements, Realities, Plans, and Designs 616
Phase II — Developing the Databases 616
Phase III — Developing the BI Applications and Portal Environment 617
Phase IV — Deploying and Managing the DW/BI System 618
Iteration and Growth 618
What We Like in the Microsoft BI Toolset 619
Future Directions: Room for Improvement 620
Conclusion 623
Index 625
Erscheint lt. Verlag | 5.4.2011 |
---|---|
Co-Autor | Ralph Kimball |
Verlagsort | New York |
Sprache | englisch |
Maße | 188 x 229 mm |
Gewicht | 1043 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Weitere Themen ► Hardware | |
ISBN-10 | 0-470-64038-3 / 0470640383 |
ISBN-13 | 978-0-470-64038-8 / 9780470640388 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich