Practical Hadoop Ecosystem

A Definitive Guide to Hadoop-Related Frameworks and Tools

Deepak Vohra (Autor)

Buch | Softcover

518 Seiten

2016
Apress (Verlag)
978-1-4842-2198-3 (ISBN)

Artikel merken

This unique in-depth book that covers most of the commonly used Hadoop servicing frameworks or tools, not covered elsewhere.
This book is practical with examples
Hadoop and its ecosystem is one of the two most popular big data frameworks out there

This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr.

From setting up the environment to running sample applications each chapter is a practical tutorial on using a Apache Hadoop ecosystem project.

While several books on Apache Hadoop are available, most are based on the main projects MapReduce and HDFS and none discusses the other Apache Hadoop ecosystem projects and how these all work together as a cohesive big data development platform.

What you'll learn
How to set up environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5.
How to run a MapReduce job
How to store data with Apache Hive, Apache HBase
How to index data in HDFS with Apache Solr
How to develop a Kafka messaging system
How to develop a Mahout User Recommender System
How to stream Logs to HDFS with Apache Flume
How to transfer data from MySQL database to Hive, HDFS and HBase with Sqoop
How create a Hive table over Apache Solr

The primary audience is Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

Deepak Vohra is a coder, developer, programmer, book author and technical reviewer.

Introduction1. HDFS and MapReduceHadoop Distributed FileSystemMapReduce FrameworksSetting the EnvironmentHadoop Cluster ModesRunning a MapReduce Job with MR1 FrameworkRunning MR1 in Standalone ModeRunning MR1 in Psuedo-Distributed ModeRunning MapReduce with Yarn FrameworkRunning YARN in Psuedo-Distributed ModeRunning Hadoop Streaming Section II Storing & Querying 2. Apache HiveSetting the EnvironmentConfiguring HadoopConfiguring HiveStarting HDFSStarting the Hive ServerStarting the Hive CLICreating a DatabaseUsing a DatabaseCreating a Managed TableLoading Data into a TableCreating a table using LIKEAdding Data with INSERT INTO TABLEAdding Data with INSERT OVERWRITECreating Table using AS SELECTAltering a TableTruncating a TableDropping a TableCreating an External Table 3. Apache HBase Setting the EnvironmentConfiguring HadoopConfiguring HBaseConfiguring HiveStarting HBaseStarting HBase ShellCreating a HBase TableAdding Data To HBase TableListing All TablesGetting a Row of DataScanning a TableCounting Number of Rows in a TableAltering a TableDeleting a RowDeleting a ColumnDisabling and Enabling a TableTruncating a TableDropping a TableFinding if a Table existsCreating a Hive External Table Section III Bulk Transferring & Streaming 4. Apache Sqoop Installing MySQL DatabaseCreating MySQL Database TablesSetting the EnvironmentConfiguring HadoopStarting HDFSConfiguring HiveConfiguring HBaseImporting into HDFSExporting from HDFSImporting into HiveImporting into HBase 5. Apache Flume Setting the EnvironmentConfiguring HadoopConfiguring HBaseStarting HDFSConfiguring FlumeRunning a Flume AgentConfiguring Flume for HBase SinkStreaming MySQL Log to HBase Sink Section IV Serializing 6. Apache Avro Setting the EnvironmentCreating an Avro SchemaCreating a Hive Managed TableCreating a Hive (version prior to 0.14) External Table Stored as Avro"div>Creating a Hive (version 0.14 and later) External Table Stored as AvroTransferring MySQL Table Data as Avro Data File with Sqoop 7. Apache Parquet Setting the Environment Creating a Oracle Database Table Exporting Oracle Database to a CSV File Importing the CSV File in MongoDB Exporting MongoDB Document as CSV File Importing a CSV File to Oracle Database Section V Messaging & Indexing 8. Apache Kafka Setting the EnvironmentStarting the Kafka ServerCreating a TopicStarting a Kafka ProducerStarting a Kafka ConsumerProducing and Consuming MessagesStreaming Log Data to Apache Kafka with Apache Flume Setting the Environment Creating Kafka Topics Configuring Flume"/div> Running Flume Agent Consuming Log Data as Kafka Messages 9. Apache Solr Setting the EnvironmentConfiguring the Solr SchemaStarting the Solr Server Indexing a Document in SolrDeleting a Document from Solr Indexing a Document in Solr with Java ClientSearching a Document in SolrCreating a Hive Managed TableCreating a Hive External TableLoading Hive External Table DataSearching Hive Table Data Indexed in Solr Section VI Machine Learning 10.Apache Mahout Setting the EnvironmentStarting HDFSSetting the Mahout EnvironmentRunning a Mahout Classification SampleRunning a Mahout Clustering SampleDeveloping a User Based Recommender System The Sample Data Setting the Environment Creating a Maven Project in Eclipse Creating a User Based Recommender Creating a Recommender Evaluator Running the Recommender Choosing a Recommender Type Choosing a User Similarity Measure Choosing a Neighborhood Type Choosing a Neighborhood Size for NearestNUserNeighborhood Choosing a Threshold for ThresholdUserNeighborhood Running the Evaluator Choosing the Split between Training Percentage and Test Percentage

Erscheinungsdatum	12.10.2016
Zusatzinfo	18 black & white illustrations, 293 colour illustrations, biography
Verlagsort	Berkeley
Sprache	englisch
Maße	178 x 254 mm
Einbandart	kartoniert
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Netzwerke
	Mathematik / Informatik ► Informatik ► Software Entwicklung
Schlagworte	Big Data • Cloud • Database • Datenbank; Administration • Framework • Hadoop • HBase
ISBN-10	1-4842-2198-2 / 1484221982
ISBN-13	978-1-4842-2198-3 / 9781484221983
Zustand	Neuware