Data Quality Management with Semantic Technologies (eBook)
XXVII, 205 Seiten
Springer Fachmedien Wiesbaden GmbH (Verlag)
978-3-658-12225-6 (ISBN)
Christian Fürber investigates the useful application of semantic technologies for the area of data quality management. Based on a literature analysis of typical data quality problems and typical activities of data quality management processes, he develops the Semantic Data Quality Management framework as the major contribution of this thesis. The SDQM framework consists of three components that are evaluated in two different use cases. Moreover, this thesis compares the framework to conventional data quality software. Besides the framework, this thesis delivers important theoretical findings, namely a comprehensive typology of data quality problems, ten generic data requirement types, a requirement-centric data quality management process, and an analysis of related work.
Dr. Christian Fürber completed his doctoral study under the supervision of Prof. Dr. Martin Hepp at the E-Business and Web Science Research Group of the Universität der Bundeswehr München. He is founder and CEO of the Information Quality Institute GmbH, a company that consults organizations of any size to improve the quality of their data.
Dr. Christian Fürber completed his doctoral study under the supervision of Prof. Dr. Martin Hepp at the E-Business and Web Science Research Group of the Universität der Bundeswehr München. He is founder and CEO of the Information Quality Institute GmbH, a company that consults organizations of any size to improve the quality of their data.
Foreword 6
Preface 10
Table of Content 12
List of Figures 18
List of Tables 21
List of Abbreviations 23
PART I – Introduction, Economic Relevance, and ResearchDesign 26
1 Introduction 26
1.1 Initial Problem Statement 26
1.2 Economic Relevance 28
1.3 Organization of this Thesis 31
1.4 Published Work 31
1.4.1 Book Chapters 32
1.4.2 Papers in Conference Proceedings 32
1.4.3 Other Publications 32
2 Research Design 33
2.1 Semantic Technologies and Ontologies 33
2.2 Research Goal 34
2.3 Research Questions 36
2.4 Research Methodology 37
2.4.1 Design Science Research Methodology 38
2.4.2 Ontology Development Methodology 43
PART II – Foundations: Data Quality, SemanticTechnologies, and the Semantic Web 45
3 Data Quality 45
3.1 Data Quality Dimensions 46
3.2 Quality Influencing Artifacts 49
3.3 Data Quality Problem Types 51
3.3.1 Quality Problems of Attribute Values 53
3.3.2 Multi-Attribute Quality Problems 55
3.3.3 Problems of Object Instances 57
3.3.4 Quality Problems of Data Models 59
3.3.5 Common Linguistic Problems 63
3.4 Data Quality in the Data Lifecycle 64
3.4.1 Data Acquisition Phase 65
3.4.2 Data Usage Phase 66
3.4.3 Data Retirement Phase 67
3.4.4 Data Quality Management throughout the Data Lifecycle 67
3.5 Data Quality Management Activities 68
3.5.1 Total Information Quality Management (TIQM) 68
3.5.2 Total Data Quality Management (TDQM) 72
3.5.3 Comparison of Methodologies 74
3.6 Role of Data Requirements in DQM 74
3.6.1 Generic Data Requirement Types 75
3.6.2 Challenges Related to Requirements Satisfaction 79
4 Semantic Technologies 81
4.1 Characteristics of an Ontology 81
4.2 Knowledge Representation in the Semantic Web 83
4.2.1 Resources and Uniform Resource Identifiers (URIs) 83
4.2.2 Core RDF Syntax: Triples, Literal Triples, and RDF Links 84
4.2.3 Constructing an Ontology with RDF, RDFS, and OWL 85
4.2.4 Language Profiles of OWL and OWL 2 88
4.3 SPARQL Query Language for RDF 89
4.4 Reasoning and Inferencing 90
4.5 Ontologies and Relational Databases 92
5 Data Quality in the Semantic Web 94
5.1 Data Sources of the Semantic Web 94
5.2 Semantic Web-specific Quality Problems 96
5.2.1 Document Content Problems 97
5.2.2 Data Format Problems 97
5.2.3 Problems of Data Definitions and Semantics 98
5.2.4 Problems of Data Classification 99
5.2.5 Problems of Hyperlinks 100
5.3 Distinct Characteristics of Data Quality in the Semantic Web 101
PART III – Development and Evaluation of the SemanticData Quality Management Framework 103
6 Specification of Initial Requirements 103
6.1 Motivating Scenario 103
6.2 Initial Requirements for SDQM 104
6.2.1 Task Requirements 105
6.2.2 Functional Requirements 107
6.2.3 Conditional Requirements 108
6.2.4 Research Requirements 110
6.3 Summary of SDQM’s Requirements 111
7 Architecture of the Semantic Data Quality Management Framework (SDQM) 112
7.1 Data Acquisition Layer 113
7.1.1 Reusable Artifacts for the Data Acquisition Layer 114
7.1.2 Data Acquisition for SDQM 115
7.2 Data Storage Layer 116
7.2.1 Reusable Artifacts for Data Storage in SDQM 116
7.2.2 The Data Storage Layer of SDQM 117
7.3 Data Quality Management Vocabulary 119
7.3.1 Reuse of Existing Ontologies 120
7.3.2 Technical Design of the DQM Vocabulary 121
7.4 Data Requirements Editor 124
7.4.1 Reusable Artifacts for SDQM’s Data Requirements Editor 125
7.4.2 Data Requirements Wiki 126
7.5 Reporting Layer 129
7.5.1 Reusable Artifacts for SDQM’s Reporting Layer 130
7.5.2 Semantic Data Quality Manager 130
8 Application Procedure of SDQM 135
8.1 Prerequisites 135
8.2 The Data Quality Management Process with SDQM 136
9 Evaluation of the Semantic Data Quality Management Framework (SDQM) 147
9.1 Evaluation of Algorithms 147
9.1.1 Algorithm Evaluation Methodology 147
9.1.2 Application Procedure 148
9.1.3 Results 149
9.2 Use Case 1: Evaluation of Material Master Data 149
9.2.1 Scenario 150
9.2.2 Setup and Application Procedure of SDQM 150
9.2.3 Results and Findings 152
9.3 Use Case 2: Evaluation of Data from DBpedia 157
9.3.1 Scenario 157
9.3.2 Specialties of Semantic Web Scenarios 158
9.3.3 Setup and Application Procedure 158
9.3.4 Results and Findings 160
9.4 Use Case 3: Consistency Checks Among Data Requirements 166
9.4.1 Scenario 167
9.4.2 Application Procedure 167
9.4.3 Summary 169
9.5 Comparison with Talend OS for Data Quality 170
9.5.1 Representation and Management of Data Requirements 170
9.5.2 Data Quality Monitoring and Assessment Reporting 173
9.5.3 Summary 176
PART IV – Related Work 178
10 Related Work 178
10.1 High-Level Classification Schema 178
10.2 Categorization Schema 179
10.2.1 Supported Data Lifecycle Step 179
10.2.2 Supported Data Representation 180
10.2.3 Supported Data Quality Task 181
10.3 Conventional Rule-Based Approaches 182
10.4 Ontology-based Approaches 183
10.4.1 Information System-oriented Approaches 183
10.4.2 Web-oriented Approaches 190
10.5 Summary 193
PART V Conclusion 196
11 Synopsis and Future Work 196
11.1 Research Summary 196
11.2 Contributions 198
11.3 Conclusion and Future Work 199
Appendix A – Comparison of TIQM and TDQM 202
Appendix B –Rules for the Evaluation of SDQM 207
Appendix C – Test Data for SDQM’s Evaluation 212
Appendix D – Evaluation Results of SDQM’s Data QualityMonitoring Queries 216
Appendix E – Evaluation Results of SDQM’s Data QualityAssessment Queries 218
References 220
Erscheint lt. Verlag | 11.12.2015 |
---|---|
Zusatzinfo | XXVII, 205 p. 63 illus. |
Verlagsort | Wiesbaden |
Sprache | englisch |
Themenwelt | Mathematik / Informatik ► Informatik |
Wirtschaft ► Betriebswirtschaft / Management ► Unternehmensführung / Management | |
Schlagworte | Data Quality Problems • Data Quality Processes • information systems • semantic technology • semantic web |
ISBN-10 | 3-658-12225-0 / 3658122250 |
ISBN-13 | 978-3-658-12225-6 / 9783658122256 |
Haben Sie eine Frage zum Produkt? |
Größe: 4,6 MB
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich