Programming Elastic MapReduce
O'Reilly Media (Verlag)
978-1-4493-6362-8 (ISBN)
Authors Kevin Schmidt and Christopher Phillips demonstrate best practices for using EMR and various AWS and Apache technologies by walking you through the construction of a sample MapReduce log analysis application. Using code samples and example configurations, you’ll learn how to assemble the building blocks necessary to solve your biggest data analysis problems.
- Get an overview of the AWS and Apache software tools used in large-scale data analysis
- Go through the process of executing a Job Flow with a simple log analyzer
- Discover useful MapReduce patterns for filtering and analyzing data sets
- Use Apache Hive and Pig instead of Java to build a MapReduce Job Flow
- Learn the basics for using Amazon EMR to run machine learning algorithms
- Develop a project cost model for using Amazon EMR and other AWS tools
Kevin J. Schmidt is a senior manager at Dell SecureWorks, Inc., anindustry leading MSSP, which is part of Dell. He is responsible for the design and development of a major part of the company’s SIEM platform. This includes data acquisition, correlation, and analysis of log data. Prior to SecureWorks, Kevin worked for Reflex Security, where he worked on an IPS engine and anti-virus software. And prior to this, he was a lead developer and architect at GuardedNet, Inc., which built one of the industry’s first SIEM platforms. He is also a commissioned officer in the United States Navy Reserve (USNR). He has over 19 years of experience in software development and design, 11 of which have been in the network security space. He holds a Bachelor of Science in Computer Science. Kevin has spent time designing cloud services components at Dell, including virtualized components to run in Dell’s own vCloud. These components are used to protect customers who use Dell’s cloud infrastructure. Additionally, he has been working with Hadoop, machine learning, and other technology in the cloud.
Christopher Phillips is a manager and senior software developer at Dell SecureWorks, Inc, an industry leading MSSP, which is part of Dell. He is responsible for the design and development of the company’s Threat Intelligence service platform. He also has responsibility for a team involved in integrating log and event information from many third-party providers that allow customers to have all of their core security information delivered to and analyzed by the Dell SecureWorks systems and security professionals. Prior to Dell SecureWorks, Chris worked for McKesson and Allscripts, where he worked with clients on HIPAA compliance, security, and healthcare systems integration. He has over 18 years of experience in software development and design. He holds a Bachelor of Science in Computer Science and an MBA. Chris has spent time designing and developing virtualization and cloud Infrastructure as a Service strategies at Dell to help our security services scale globally Additionally, he has been working with Hadoop, Pig scripting languages, and Amazon Elastic Map Reduce to develop strategies to gain insights and analyze Big Data issues in the cloud.
Chapter 1 Introduction to Amazon Elastic MapReduce
Amazon Web Services Used in This Book
Amazon Elastic MapReduce
Amazon EMR and the Hadoop Ecosystem
Amazon Elastic MapReduce Versus Traditional Hadoop Installs
Application Building Blocks
Chapter 2 Data Collection and Data Analysis with AWS
Log Analysis Application
Log Messages as a Data Set for Analytics
Understanding MapReduce
Collection Stage
Simulating Syslog Data
Developing a MapReduce Application
Custom JAR MapReduce Job
Running an Amazon EMR Cluster
Viewing Our Results
Debugging a Job Flow
Our Application and Real-World Uses
Chapter 3 Data Filtering Design Patterns and Scheduling Work
Extending the Application Example
Understanding Web Server Logs
Finding Errors in the Web Logs Using Data Filtering
Building Summary Counts in Data Sets
Job Flow Scheduling
Scheduling with AWS Data Pipeline
Real-World Uses
Chapter 4 Data Analysis with Hive and Pig in Amazon EMR
Amazon Job Flow Technologies
What Is Pig?
Utilizing Pig in Amazon EMR
What Is Hive?
Utilizing Hive in Amazon EMR
Our Application with Hive and Pig
Chapter 5 Machine Learning Using EMR
A Quick Tour of Machine Learning
Python and EMR
What’s Next?
Chapter 6 Planning AWS Projects and Managing Costs
Developing a Project Cost Model
Optimizing AWS Resources to Reduce Project Costs
Amazon Tools for Estimating Your Project Costs
Appendix Amazon Web Services Resources and Tools
Amazon AWS Online Resources
Amazon AWS Cost Estimation Tools
AWS Best Practices and Architecture
Amazon EMR Distributions
Appendix Cloud Computing, Amazon Web Services, and Their Impacts
AWS Service Delivery Models
Performance
Elasticity and Growth
Security
Uptime and Availability
Appendix Installation and Setup
Prerequisites
Installing Hadoop
Building MapReduce Applications
Running MapReduce Applications Locally
Installing Pig
Installing Hive
Index
Colophon
Erscheint lt. Verlag | 28.1.2014 |
---|---|
Zusatzinfo | black & white illustrations |
Verlagsort | Sebastopol |
Sprache | englisch |
Maße | 178 x 233 mm |
Gewicht | 304 g |
Einbandart | kartoniert |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
ISBN-10 | 1-4493-6362-8 / 1449363628 |
ISBN-13 | 978-1-4493-6362-8 / 9781449363628 |
Zustand | Neuware |
Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich