Big Data for Chimps
O'Reilly Media (Verlag)
978-1-4919-2394-8 (ISBN)
Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data.
- Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster
- Dive into map/reduce mechanics and build your first map/reduce job in Python
- Understand how to run chains of map/reduce jobs in the form of Pig scripts
- Use a real-world dataset—baseball performance statistics—throughout the book
- Work with examples of several analytic patterns, and learn when and where you might use them
Philip (flip) Kromer is the founder and CTO at Infochimps.com, a big data platform that makes acquiring, storing and analyzing massive data streams transformatively easier. He enjoys Bowling, Scrabble, working on old cars or new wood, and rooting for the Red Sox.
Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.
Introduction: Theory and Tools
Chapter 1Hadoop Basics
Chimpanzee and Elephant Start a Business
Map-Only Jobs: Process Records Individually
Pig Latin Map-Only Job
Setting Up a Docker Hadoop Cluster
Wrapping Up
Chapter 2MapReduce
Chimpanzee and Elephant Save Christmas
Pygmy Elephants Carry Each Toy Form to the Appropriate Workbench
Example: Reindeer Games
Hadoop Versus Traditional Databases
The MapReduce Haiku
Wrapping Up
Chapter 3A Quick Look into Baseball
The Data
Acronyms and Terminology
The Rules and Goals
Performance Metrics
Wrapping Up
Chapter 4Introduction to Pig
Pig Helps Hadoop Work with Tables, Not Records
Fundamental Data Operations
LOAD Locates and Describes Your Data
STORE Writes Data to Disk
Development Aid Commands
Pig Functions
Piggybank
Apache DataFu
Wrapping Up
Tactics: Analytic Patterns
Chapter 5Map-Only Operations
Pattern in Use
Eliminating Data
Selecting Records That Satisfy a Condition: FILTER and Friends
Project Only Chosen Columns by Name
Transforming Records
Operations That Break One Table into Many
Operations That Treat the Union of Several Tables as One
Wrapping Up
Chapter 6Grouping Operations
Grouping Records into a Bag by Key
Group and Aggregate
Calculating the Distribution of Numeric Values with a Histogram
The Summing Trick
Wrapping Up
References
Chapter 7Joining Tables
Matching Records Between Tables (Inner Join)
How a Join Works
Enumerating a Many-to-Many Relationship
Joining a Table with Itself (Self-Join)
Joining Records Without Discarding Nonmatches (Outer Join)
Selecting Only Records That Lack a Match in Another Table (Anti-Join)
Selecting Only Records That Possess a Match in Another Table (Semi-Join)
Wrapping Up
Chapter 8Ordering Operations
Preparing Career Epochs
Sorting All Records in Total Order
Sorting Records Within a Group
Numbering Records in Rank Order
Wrapping Up
Chapter 9Duplicate and Unique Records
Handling Duplicates
Set Operations
Wrapping Up
Erscheint lt. Verlag | 17.11.2015 |
---|---|
Verlagsort | Sebastopol |
Sprache | englisch |
Maße | 178 x 232 mm |
Gewicht | 380 g |
Einbandart | kartoniert |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Schlagworte | Apache Hadoop • Big Data • Big-Data-Analyse • Big Data Analytics • Big Data and Data Mining • Big Data Patterns • MapReduce |
ISBN-10 | 1-4919-2394-6 / 1491923946 |
ISBN-13 | 978-1-4919-2394-8 / 9781491923948 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich