Big Data For Dummies
For Dummies (Verlag)
978-1-118-50422-2 (ISBN)
Find the right big data solution for your business or organization
Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work.
Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals
Authors are experts in information management, big data, and a variety of solutions
Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more
Provides essential information in a no-nonsense, easy-to-understand style that is empowering
Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.
Judith Hurwitz is an expert in cloud computing, information management, and business strategy. Alan Nugent has extensive experience in cloud-based big data solutions. Dr. Fern Halper specializes in big data and analytics. Marcia Kaufman specializes in cloud infrastructure, information management, and analytics.
Introduction 1
About This Book 2
Foolish Assumptions 2
How This Book Is Organized 3
Part I: Getting Started with Big Data 3
Part II: Technology Foundations for Big Data 3
Part III: Big Data Management 3
Part IV: Analytics and Big Data 4
Part V: Big Data Implementation 4
Part VI: Big Data Solutions in the Real World 4
Part VII: The Part of Tens 4
Glossary 4
Icons Used in This Book 5
Where to Go from Here 5
Part I: Getting Started with Big Data 7
Chapter 1: Grasping the Fundamentals of Big Data 9
The Evolution of Data Management 10
Understanding the Waves of Managing Data 11
Wave 1: Creating manageable data structures 11
Wave 2: Web and content management 13
Wave 3: Managing big data 14
Defining Big Data 15
Building a Successful Big Data Management Architecture 16
Beginning with capture, organize, integrate, analyze, and act 16
Setting the architectural foundation 17
Performance matters 20
Traditional and advanced analytics 22
The Big Data Journey 23
Chapter 2: Examining Big Data Types 25
Defining Structured Data 26
Exploring sources of big structured data 26
Understanding the role of relational databases in big data 27
Defining Unstructured Data 29
Exploring sources of unstructured data 29
Understanding the role of a CMS in big data management 31
Looking at Real-Time and Non-Real-Time Requirements 32
Putting Big Data Together 33
Managing different data types 33
Integrating data types into a big data environment 34
Chapter 3: Old Meets New: Distributed Computing 37
A Brief History of Distributed Computing 37
Giving thanks to DARPA 38
The value of a consistent model 39
Understanding the Basics of Distributed Computing 40
Why we need distributed computing for big data 40
The changing economics of computing 40
The problem with latency 41
Demand meets solutions 41
Getting Performance Right 42
Part II: Technology Foundations for Big Data 45
Chapter 4: Digging into Big Data Technology Components 47
Exploring the Big Data Stack 48
Layer 0: Redundant Physical Infrastructure 49
Physical redundant networks 51
Managing hardware: Storage and servers 51
Infrastructure operations 51
Layer 1: Security Infrastructure 52
Interfaces and Feeds to and from Applications and the Internet 53
Layer 2: Operational Databases 54
Layer 3: Organizing Data Services and Tools 56
Layer 4: Analytical Data Warehouses 56
Big Data Analytics 58
Big Data Applications 58
Chapter 5: Virtualization and How It Supports Distributed Computing 61
Understanding the Basics of Virtualization 61
The importance of virtualization to big data 63
Server virtualization 64
Application virtualization 65
Network virtualization 66
Processor and memory virtualization 66
Data and storage virtualization 67
Managing Virtualization with the Hypervisor 68
Abstraction and Virtualization 69
Implementing Virtualization to Work with Big Data 69
Chapter 6: Examining the Cloud and Big Data 71
Defining the Cloud in the Context of Big Data 71
Understanding Cloud Deployment and Delivery Models 72
Cloud deployment models 73
Cloud delivery models 74
The Cloud as an Imperative for Big Data 75
Making Use of the Cloud for Big Data 77
Providers in the Big Data Cloud Market 78
Amazon’s Public Elastic Compute Cloud 78
Google big data services 79
Microsoft Azure 80
OpenStack 80
Where to be careful when using cloud services 81
Part III: Big Data Management 83
Chapter 7: Operational Databases 85
RDBMSs Are Important in a Big Data Environment 87
PostgreSQL relational database 87
Nonrelational Databases 88
Key-Value Pair Databases 89
Riak key-value database 90
Document Databases 91
MongoDB 92
CouchDB 93
Columnar Databases 94
HBase columnar database 94
Graph Databases 95
Neo4J graph database 96
Spatial Databases 97
PostGIS/OpenGEO Suite 98
Polyglot Persistence 99
Chapter 8: MapReduce Fundamentals 101
Tracing the Origins of MapReduce 101
Understanding the map Function 103
Adding the reduce Function 104
Putting map and reduce Together 105
Optimizing MapReduce Tasks 108
Hardware/network topology 108
Synchronization 108
File system 108
Chapter 9: Exploring the World of Hadoop 111
Explaining Hadoop 111
Understanding the Hadoop Distributed File System (HDFS) 112
NameNodes 113
Data nodes 114
Under the covers of HDFS 115
Hadoop MapReduce 116
Getting the data ready 117
Let the mapping begin 118
Reduce and combine 118
Chapter 10: The Hadoop Foundation and Ecosystem 121
Building a Big Data Foundation with the Hadoop Ecosystem 121
Managing Resources and Applications with Hadoop YARN 122
Storing Big Data with HBase 123
Mining Big Data with Hive 124
Interacting with the Hadoop Ecosystem 125
Pig and Pig Latin 125
Sqoop 126
Zookeeper 127
Chapter 11: Appliances and Big Data Warehouses 129
Integrating Big Data with the Traditional Data Warehouse 129
Optimizing the data warehouse 130
Differentiating big data structures from data warehouse data 130
Examining a hybrid process case study 131
Big Data Analysis and the Data Warehouse 133
The integration lynchpin 134
Rethinking extraction, transformation, and loading 134
Changing the Role of the Data Warehouse 135
Changing Deployment Models in the Big Data Era 136
The appliance model 136
The cloud model 137
Examining the Future of Data Warehouses 137
Part IV: Analytics and Big Data 139
Chapter 12: Defining Big Data Analytics 141
Using Big Data to Get Results 142
Basic analytics 142
Advanced analytics 143
Operationalized analytics 146
Monetizing analytics 146
Modifying Business Intelligence Products to Handle Big Data 147
Data 147
Analytical algorithms 148
Infrastructure support 148
Studying Big Data Analytics Examples 149
Orbitz 149
Nokia 150
NASA 150
Big Data Analytics Solutions 151
Chapter 13: Understanding Text Analytics and Big Data 153
Exploring Unstructured Data 154
Understanding Text Analytics 155
The difference between text analytics and search 156
Analysis and Extraction Techniques 157
Understanding the extracted information 159
Taxonomies 160
Putting Your Results Together with Structured Data 160
Putting Big Data to Use 161
Voice of the customer 161
Social media analytics 162
Text Analytics Tools for Big Data 164
Attensity 164
Clarabridge 165
IBM 165
OpenText 165
SAS 166
Chapter 14: Customized Approaches for Analysis of Big Data 167
Building New Models and Approaches to Support Big Data 168
Characteristics of big data analysis 168
Understanding Different Approaches to Big Data Analysis 170
Custom applications for big data analysis 171
Semi-custom applications for big data analysis 173
Characteristics of a Big Data Analysis Framework 174
Big to Small: A Big Data Paradox 177
Part V: Big Data Implementation 179
Chapter 15: Integrating Data Sources 181
Identifying the Data You Need 181
Exploratory stage 182
Codifying stage 184
Integration and incorporation stage 184
Understanding the Fundamentals of Big Data Integration 186
Defining Traditional ETL 187
Data transformation 188
Understanding ELT — Extract, Load, and Transform 189
Prioritizing Big Data Quality 189
Using Hadoop as ETL 191
Best Practices for Data Integration in a Big Data World 191
Chapter 16: Dealing with Real-Time Data Streams and Complex Event Processing 193
Explaining Streaming Data and Complex Event Processing 194
Using Streaming Data 194
Data streaming 195
The need for metadata in streams 196
Using Complex Event Processing 198
Differentiating CEP from Streams 199
Understanding the Impact of Streaming Data and CEP on Business 200
Chapter 17: Operationalizing Big Data 201
Making Big Data a Part of Your Operational Process 201
Integrating big data 202
Incorporating big data into the diagnosis of diseases 203
Understanding Big Data Workflows 205
Workload in context to the business problem 206
Ensuring the Validity, Veracity, and Volatility of Big Data 207
Data validity 207
Data volatility 208
Chapter 18: Applying Big Data within Your Organization 211
Figuring the Economics of Big Data 212
Identification of data types and sources 212
Business process modifications or new process creation 215
The technology impact of big data workflows 215
Finding the talent to support big data projects 216
Calculating the return on investment (ROI) from big data investments 216
Enterprise Data Management and Big Data 217
Defining Enterprise Data Management 217
Creating a Big Data Implementation Road Map 218
Understanding business urgency 218
Projecting the right amount of capacity 219
Selecting the right software development methodology 219
Balancing budgets and skill sets 219
Determining your appetite for risk 220
Starting Your Big Data Road Map 220
Chapter 19: Security and Governance for Big Data Environments 225
Security in Context with Big Data 225
Assessing the risk for the business 226
Risks lurking inside big data 226
Understanding Data Protection Options 227
The Data Governance Challenge 228
Auditing your big data process 230
Identifying the key stakeholders 231
Putting the Right Organizational Structure in Place 231
Preparing for stewardship and management of risk 232
Setting the right governance and quality policies 232
Developing a Well-Governed and Secure Big Data Environment 233
Part VI: Big Data Solutions in the Real World 235
Chapter 20: The Importance of Big Data to Business 237
Big Data as a Business Planning Tool 238
Stage 1: Planning with data 238
Stage 2: Doing the analysis 239
Stage 3: Checking the results 239
Stage 4: Acting on the plan 240
Adding New Dimensions to the Planning Cycle 240
Stage 5: Monitoring in real time 240
Stage 6: Adjusting the impact 241
Stage 7: Enabling experimentation 241
Keeping Data Analytics in Perspective 241
Getting Started with the Right Foundation 242
Getting your big data strategy started 242
Planning for Big Data 243
Transforming Business Processes with Big Data 244
Chapter 21: Analyzing Data in Motion: A Real-World View 245
Understanding Companies’ Needs for Data in Motion 246
The value of streaming data 247
Streaming Data with an Environmental Impact 247
Using sensors to provide real-time information about rivers and oceans 248
The benefits of real-time data 249
Streaming Data with a Public Policy Impact 249
Streaming Data in the Healthcare Industry 251
Capturing the data stream 251
Streaming Data in the Energy Industry 252
Using streaming data to increase energy efficiency 252
Using streaming data to advance the production of alternative sources of energy 252
Connecting Streaming Data to Historical and Other Real-Time Data Sources 253
Chapter 22: Improving Business Processes with Big Data Analytics: A Real-World View 255
Understanding Companies’ Needs for Big Data Analytics 256
Improving the Customer Experience with Text Analytics 256
The business value to the big data analytics implementation 257
Using Big Data Analytics to Determine Next Best Action 257
Preventing Fraud with Big Data Analytics 260
The Business Benefit of Integrating New Sources of Data 262
Part VII: The Part of Tens 263
Chapter 23: Ten Big Data Best Practices 265
Understand Your Goals 265
Establish a Road Map 266
Discover Your Data 266
Figure Out What Data You Don’t Have 267
Understand the Technology Options 267
Plan for Security in Context with Big Data 268
Plan a Data Governance Strategy 268
Plan for Data Stewardship 268
Continually Test Your Assumptions 269
Study Best Practices and Leverage Patterns 269
Chapter 24: Ten Great Big Data Resources 271
Hurwitz & Associates 271
Standards Organizations 271
The Open Data Foundation 272
The Cloud Security Alliance 272
National Institute of Standards and Technology 272
Apache Software Foundation 273
Oasis 273
Vendor Sites 273
Online Collaborative Sites 274
Big Data Conferences 274
Chapter 25: Ten Big Data Do’s and Don’ts 275
Do Involve All Business Units in Your Big Data Strategy 275
Do Evaluate All Delivery Models for Big Data 276
Do Think about Your Traditional Data Sources as Part of Your Big Data Strategy 276
Do Plan for Consistent Metadata 276
Do Distribute Your Data 277
Don’t Rely on a Single Approach to Big Data Analytics 277
Don’t Go Big Before You Are Ready 277
Don’t Overlook the Need to Integrate Data 277
Don’t Forget to Manage Data Securely 278
Don’t Overlook the Need to Manage the Performance of Your Data 278
Glossary 279
Index 295
Sprache | englisch |
---|---|
Maße | 185 x 231 mm |
Gewicht | 431 g |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Informatik ► Office Programme ► Outlook | |
ISBN-10 | 1-118-50422-4 / 1118504224 |
ISBN-13 | 978-1-118-50422-2 / 9781118504222 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich