Data Quality (eBook)
XIX, 262 Seiten
Springer Berlin (Verlag)
978-3-540-33173-5 (ISBN)
Poor data quality can seriously hinder or damage the efficiency and effectiveness of organizations and businesses. The growing awareness of such repercussions has led to major public initiatives like the 'Data Quality Act' in the USA and the 'European 2003/98' directive of the European Parliament.
Batini and Scannapieco present a comprehensive and systematic introduction to the wide set of issues related to data quality. They start with a detailed description of different data quality dimensions, like accuracy, completeness, and consistency, and their importance in different types of data, like federated data, web data, or time-dependent data, and in different data categories classified according to frequency of change, like stable, long-term, and frequently changing data. The book's extensive description of techniques and methodologies from core data quality research as well as from related fields like data mining, probability theory, statistical data analysis, and machine learning gives an excellent overview of the current state of the art. The presentation is completed by a short description and critical comparison of tools and practical methodologies, which will help readers to resolve their own quality problems.
This book is an ideal combination of the soundness of theoretical foundations and the applicability of practical approaches. It is ideally suited for everyone - researchers, students, or professionals - interested in a comprehensive overview of data quality issues. In addition, it will serve as the basis for an introductory course or for self-study on this topic.
Carlo Batini is full professor of Computer Engineering at University of Milano Bicocca. He has been associate professor since 1983 and full professor since 1986. His research interests include cooperative information systems, information systems and data base modeling and design, usability of information systems, data and information quality. From 1995 to 2003 he was a member of the board of directors of the Authority for Information Technology in public administration, where he headed several large scale projects for the modernization of public administration.
Monica Scannapieco is a research associate at the Computer Engineering Department of the University of Roma La Sapienza. Her research interests are data quality issues, including data quality dimensions, measurement and improvement techniques, dynamics of data quality, record matching.
Carlo Batini is full professor of Computer Engineering at University of Milano Bicocca. He has been associate professor since 1983 and full professor since 1986. His research interests include cooperative information systems, information systems and data base modeling and design, usability of information systems, data and information quality. From 1995 to 2003 he was a member of the board of directors of the Authority for Information Technology in public administration, where he headed several large scale projects for the modernization of public administration. Monica Scannapieco is a research associate at the Computer Engineering Department of the University of Roma La Sapienza. Her research interests are data quality issues, including data quality dimensions, measurement and improvement techniques, dynamics of data quality, record matching.
Preface 6
Motivation for the Book 6
Goals 7
Organization 9
Intended Audience 10
Guidelines for Teaching 12
Acknowledgements 13
Contents 14
1 Introduction to Data Quality 19
1.1 Why Data Quality is Relevant 19
1.2 Introduction to the Concept of Data Quality 22
1.3 Data Quality and Types of Data 24
1.4 Data Quality and Types of Information Systems 27
1.5 Main Research Issues and Application Domains in Data Quality 29
1.6 Summary 35
2 Data Quality Dimensions 37
2.1 Accuracy 38
2.2 Completeness 41
2.3 Time-Related Dimensions: Currency, Timeliness, and Volatility 46
2.4 Consistency 48
2.5 Other Data Quality Dimensions 50
2.6 Approaches to the Definition of Data Quality Dimensions 54
2.7 Schema Quality Dimensions 60
2.8 Summary 66
3 Models for Data Quality 69
3.1 Introduction 69
3.2 Extensions of Structured Data Models 70
3.3 Extensions of Semistructured Data Models 77
3.4 Management Information System Models 79
3.5 Summary 86
4 Activities and Techniques for Data Quality: Generalities 87
4.1 Data Quality Activities 88
4.2 Quality Composition 89
4.3 Error Localization and Correction 100
4.4 Cost and Benefit Classifications 106
4.5 Summary 113
5 Object Identification 115
5.1 Historical Perspective 116
5.2 Object Identification for Different Data Types 117
5.3 The High-Level Process for Object Identification 119
5.4 Details on the Steps for Object Identification 121
5.5 Object Identification Techniques 124
5.6 Probabilistic Techniques 124
5.7 Empirical Techniques 131
5.8 Knowledge-Based Techniques 139
5.9 Comparison of Techniques 143
5.10 Summary 149
6 Data Quality Issues in Data Integration Systems 151
6.1 Introduction 151
6.2 Generalities on Data Integration Systems 152
6.3 Techniques for Quality-Driven Query Processing 155
6.4 Instance-level Conflict Resolution 161
6.5 Inconsistencies in Data Integration: a Theoretical Perspective 175
6.6 Summary 178
7 Methodologies for Data Quality Measurement and Improvement 179
7.1 Basics on Data Quality Methodologies 179
7.2 Assessment Methodologies 185
7.3 Comparative Analysis of General-purpose Methodologies 188
7.4 The CDQM methodology 199
7.5 A Case Study in the e-Government Area 206
7.6 Summary 217
8 Tools for Data Quality 219
8.1 Introduction 219
8.2 Tools 220
8.3 Frameworks for Cooperative Information Systems 230
8.4 Toolboxes to Compare Tools 234
8.5 Summary 236
9 Open Problems 239
9.1 Dimensions and Metrics 239
9.2 Object Identification 240
9.3 Data Integration 245
9.4 Methodologies 248
9.5 Conclusions 253
References 255
Index 267
Erscheint lt. Verlag | 27.9.2006 |
---|---|
Reihe/Serie | Data-Centric Systems and Applications | Data-Centric Systems and Applications |
Zusatzinfo | XIX, 262 p. |
Verlagsort | Berlin |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Wirtschaft ► Betriebswirtschaft / Management ► Wirtschaftsinformatik | |
Schlagworte | Data Accuracy • Data Availability • Data Completeness • Data Consistency • data integration • Data Mining • Data Quality • Distributed Data Management • learning • organization |
ISBN-10 | 3-540-33173-5 / 3540331735 |
ISBN-13 | 978-3-540-33173-5 / 9783540331735 |
Haben Sie eine Frage zum Produkt? |
Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich