DW 2.0: The Architecture for the Next Generation of Data Warehousing (eBook)
400 Seiten
Elsevier Science (Verlag)
978-0-08-055833-2 (ISBN)
DW 2.0: The Architecture for the Next Generation of Data Warehousing is the first book on the new generation of data warehouse architecture, DW 2.0, by the father of the data warehouse. The book describes the future of data warehousing that is technologically possible today, at both an architectural level and technology level.
The perspective of the book is from the top down: looking at the overall architecture and then delving into the issues underlying the components. This allows people who are building or using a data warehouse to see what lies ahead and determine what new technology to buy, how to plan extensions to the data warehouse, what can be salvaged from the current system, and how to justify the expense at the most practical level. This book gives experienced data warehouse professionals everything they need in order to implement the new generation DW 2.0.
It is designed for professionals in the IT organization, including data architects, DBAs, systems design and development professionals, as well as data warehouse and knowledge management professionals.
* First book on the new generation of data warehouse architecture, DW 2.0.* Written by the 'father of the data warehouse', Bill Inmon, a columnist and newsletter editor of The Bill Inmon Channel on the Business Intelligence Network.
* Long overdue comprehensive coverage of the implementation of technology and tools that enable the new generation of the DW: metadata, temporal data, ETL, unstructured data, and data quality control.
DW 2.0: The Architecture for the Next Generation of Data Warehousing is the first book on the new generation of data warehouse architecture, DW 2.0, by the father of the data warehouse. The book describes the future of data warehousing that is technologically possible today, at both an architectural level and technology level. The perspective of the book is from the top down: looking at the overall architecture and then delving into the issues underlying the components. This allows people who are building or using a data warehouse to see what lies ahead and determine what new technology to buy, how to plan extensions to the data warehouse, what can be salvaged from the current system, and how to justify the expense at the most practical level. This book gives experienced data warehouse professionals everything they need in order to implement the new generation DW 2.0. It is designed for professionals in the IT organization, including data architects, DBAs, systems design and development professionals, as well as data warehouse and knowledge management professionals. - First book on the new generation of data warehouse architecture, DW 2.0- Written by the "e;father of the data warehouse"e;, Bill Inmon, a columnist and newsletter editor of The Bill Inmon Channel on the Business Intelligence Network- Long overdue comprehensive coverage of the implementation of technology and tools that enable the new generation of the DW: metadata, temporal data, ETL, unstructured data, and data quality control
Front cover 1
DW 2.0: The architecture for the next generation of data warehousing 4
Copyright page 5
Contents 8
Preface 18
Acknowledgments 21
About the authors 22
CHAPTER 1 A brief history of data warehousing and first-generation data warehouses 24
Data base management systems 24
Online applications 25
Personal computers and 4GL technology 26
The spider web environment 27
Evolution from the business perspective 28
The data warehouse environment 29
What is a data warehouse? 30
Integrating data—a painful experience 30
Volumes of data 31
A different development approach 31
Evolution to the DW 2.0 environment 32
The business impact of the data warehouse 34
Various components of the data warehouse environment 34
ETL—extract/transform/load 35
ODS—operational data store 36
Data mart 36
Exploration warehouse 36
The evolution of data warehousing from the business perspective 37
Other notions about a data warehouse 37
The active data warehouse 38
The federated data warehouse approach 39
The star schema approach 41
The data mart data warehouse 43
Building a "real" data warehouse 44
Summary 45
CHAPTER 2 An introduction to DW 2.0 46
DW 2.0—a new paradigm 47
DW 2.0—from the business perspective 47
The life cycle of data 50
Reasons for the different sectors 53
Metadata 54
Access of data 56
Structured data/unstructured data 57
Textual analytics 58
Blather 61
The issue of terminology 61
Specific text/general text 63
Metadata—a major component 63
Local metadata 66
A foundation of technology 68
Changing business requirements 70
The flow of data within DW 2.0 71
Volumes of data 73
Useful applications 74
DW 2.0 and referential integrity 75
Reporting in DW 2.0 76
Summary 76
CHAPTER 3 DW 2.0 components—about the different sectors 78
The Interactive Sector 78
The Integrated Sector 85
The Near Line Sector 94
The Archival Sector 99
Unstructured processing 109
From the business perspective 113
Summary 115
CHAPTER 4 Metadata in DW 2.0 118
Reusability of data and analysis 119
Metadata in DW 2.0 119
Active repository/passive repository 122
The active repository 123
Enterprise metadata 124
Metadata and the system of record 125
Taxonomy 127
Internal taxonomies/external taxonomies 127
Metadata in the Archival Sector 128
Maintaining metadata 129
Using metadata—an example 129
From the end-user perspective 132
Summary 133
CHAPTER 5 Fluidity of the DW 2.0 technology infrastructure 134
The technology infrastructure 135
Rapid business changes 137
The treadmill of change 137
Getting off the treadmill 138
Reducing the length of time for IT to respond 138
Semantically temporal, semantically static data 138
Semantically temporal data 139
Semantically stable data 140
Mixing semantically stable and unstable data 141
Separating semantically stable and unstable data 141
Mitigating business change 142
Creating snapshots of data 143
A historical record 143
Dividing data 144
From the end-user perspective 144
Summary 145
CHAPTER 6 Methodology and approach for DW 2.0 146
Spiral methodology—a summary of key features 147
The seven streams approach—an overview 152
Enterprise reference model stream 152
Enterprise knowledge coordination stream 152
Information factory development stream 156
Data profiling and mapping stream 156
Data correction stream 156
Infrastructure stream 156
Total information quality management stream 157
Summary 160
CHAPTER 7 Statistical processing and DW 2.0 164
Two types of transactions 164
Using statistical analysis 166
The integrity of the comparison 167
Heuristic analysis 168
Freezing data 169
Exploration processing 169
The frequency of analysis 170
The exploration facility 170
The sources for exploration processing 172
Refreshing exploration data 172
Project-based data 173
Data marts and the exploration facility 175
A backflow of data 175
Using exploration data internally 178
From the perspective of the business analyst 178
Summary 179
CHAPTER 8 Data models and DW 2.0 180
An intellectual road map 180
The data model and business 180
The scope of integration 181
Making the distinction between granular and summarized data 182
Levels of the data model 182
Data models and the Interactive Sector 184
The corporate data model 185
A transformation of models 186
Data models and unstructured data 187
From the perspective of the business user 189
Summary 190
CHAPTER 9 Monitoring the DW 2.0 environment 192
Monitoring the DW 2.0 environment 192
The transaction monitor 192
Monitoring data quality 193
A data warehouse monitor 194
The transaction monitor—response time 194
Peak-period processing 195
The ETL data quality monitor 197
The data warehouse monitor 199
Dormant data 200
From the perspective of the business user 201
Summary 202
CHAPTER 10 DW 2.0 and security 204
Protecting access to data 204
Encryption 204
Drawbacks 205
The firewall 205
Moving data offline 205
Limiting encryption 207
A direct dump 207
The data warehouse monitor 208
Sensing an attack 208
Security for near line data 210
From the perspective of the business user 210
Summary 211
CHAPTER 11 Time-variant data 214
All data in DW 2.0—relative to time 214
Time relativity in the Interactive Sector 215
Data relativity elsewhere in DW 2.0 215
Transactions in the Integrated Sector 216
Discrete data 217
Continuous time span data 217
A sequence of records 219
Nonoverlapping records 220
Beginning and ending a sequence of records 220
Continuity of data 221
Time-collapsed data 221
Time variance in the Archival Sector 222
From the perspective of the end user 223
Summary 223
CHAPTER 12 The flow of data in DW 2.0 226
The flow of data throughout the architecture 226
Entering the Interactive Sector 226
The role of ETL 228
Data flow into the Integrated Sector 228
Data flow into the Near Line Sector 230
Data flow into the Archival Sector 232
The falling probability of data access 232
Exception-based flow of data 233
From the perspective of the business user 236
Summary 237
CHAPTER 13 ETL processing and DW 2.0 238
Changing states of data 238
Where ETL fits 238
From application data to corporate data 239
ETL in online mode 239
ETL in batch mode 240
Source and target 241
An ETL mapping 242
Changing states—an example 242
More complex transformations 244
ETL and throughput 245
ETL and metadata 246
ETL and an audit trail 246
ETL and data quality 247
Creating ETL 247
Code creation or parametrically driven ETL 248
ETL and rejects 248
Changed data capture 249
ELT 249
From the perspective of the business user 250
Summary 251
CHAPTER 14 DW 2.0 and the granularity manager 254
The granularity manager 254
Raising the level of granularity 255
Filtering data 255
The functions of the granularity manager 257
Home-grown versus third-party granularity managers 259
Parallelizing the granularity manager 260
Metadata as a by-product 260
From the perspective of the business user 261
Summary 261
CHAPTER 15 DW 2.0 and performance 262
Good performance—a cornerstone for DW 2.0 262
Online response time 263
Analytical response time 264
The flow of data 264
Queues 265
Heuristic processing 266
Analytical productivity and response time 266
Many facets to performance 267
Indexing 268
Removing dormant data 268
End-user education 269
Monitoring the environment 269
Capacity planning 270
Metadata 271
Batch parallelization 272
Parallelization for transaction processing 272
Workload management 273
Data marts 274
Exploration facilities 275
Separation of transactions into classes 276
Service level agreements 277
Protecting the Interactive Sector 277
Partitioning data 278
Choosing the proper hardware 279
Separating farmers and explorers 279
Physically group data together 280
Check automatically generated code 280
From the perspective of the business user 281
Summary 282
CHAPTER 16 Migration 284
Houses and cities 284
Migration in a perfect world 285
The perfect world almost never happens 285
Adding components incrementally 285
Adding the Archival Sector 287
Creating enterprise metadata 288
Building the metadata infrastructure 289
“Swallowing” source systems 289
ETL as a shock absorber 290
Migration to the unstructured environment 290
From the perspective of the business user 292
Summary 293
CHAPTER 17 Cost justification and DW 2.0 294
Is DW 2.0 worth it? 294
Macro-level justification 294
A micro-level cost justification 295
Company B has DW 2.0 296
Creating new analysis 296
Executing the steps 297
So how much does all of this cost? 299
Consider company B 299
Factoring the cost of DW 2.0 300
Reality of information 301
The real economics of DW 2.0 302
The time value of information 302
The value of integration 303
Historical information 303
First-generation DW and DW 2.0—the economics 304
From the perspective of the business user 305
Summary 305
CHAPTER 18 Data quality in DW 2.0 308
The DW 2.0 data quality tool set 310
Data profiling tools and the reverse-engineered data model 311
Data model types 312
Data profiling inconsistencies challenge top-down modeling 317
Summary 319
CHAPTER 19 DW 2.0 and unstructured data 322
DW 2.0 and unstructured data 322
Reading text 322
Where to do textual analytical processing 323
Integrating text 324
Simple editing 325
Stop words 325
Synonym replacement 326
Synonym concatenation 326
Homographic resolution 326
Creating themes 327
External glossaries/taxonomies 327
Stemming 328
Alternate spellings 328
Text across languages 328
Direct searches 329
Indirect searches 329
Terminology 330
Semistructured data/VALUE = NAME data 330
The technology needed to prepare the data 331
The relational data base 332
Structured/unstructured linkage 332
From the perspective of the business user 333
Summary 333
CHAPTER 20 DW 2.0 and the system of record 336
Other systems of record 342
From the perspective of the business user 342
Summary 344
CHAPTER 21 Miscellaneous topics 346
Data marts 346
The convenience of a data mart 347
Transforming data mart data 348
Monitoring DW 2.0 349
Moving data from one data mart to another 350
Bad data 352
A balancing entry 353
Resetting a value 353
Making corrections 353
The speed of movement of data 354
Data warehouse utilities 355
Summary 360
CHAPTER 22 Processing in the DW 2.0 environment 362
Summary 368
CHAPTER 23 Administering the DW 2.0 environment 370
The data model 370
Architectural administration 371
Defining the moment when an Archival Sector will be needed 371
Determining whether the Near Line Sector is needed 372
Metadata administration 374
Data base administration 375
Stewardship 376
Systems and technology administration 378
Management administration of the DW 2.0 environment 381
Prioritization and prioritization conflicts 381
Budget 381
Scheduling and determination of milestones 382
Allocation of resources 382
Managing consultants 382
Summary 384
Index 386
A 386
B 386
C 387
D 387
E 388
F 389
G 389
H 389
I 389
K 390
L 390
M 390
N 391
O 391
P 391
Q 392
R 392
S 392
T 393
U 394
V 394
W 394
Z 394
Erscheint lt. Verlag | 28.7.2010 |
---|---|
Sprache | englisch |
Themenwelt | Sachbuch/Ratgeber |
Mathematik / Informatik ► Informatik ► Datenbanken | |
Mathematik / Informatik ► Informatik ► Software Entwicklung | |
Mathematik / Informatik ► Mathematik ► Finanz- / Wirtschaftsmathematik | |
Sozialwissenschaften ► Kommunikation / Medien ► Buchhandel / Bibliothekswesen | |
Wirtschaft ► Betriebswirtschaft / Management ► Wirtschaftsinformatik | |
ISBN-10 | 0-08-055833-X / 008055833X |
ISBN-13 | 978-0-08-055833-2 / 9780080558332 |
Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich