Practical Hadoop Ecosystem
Apress (Verlag)
978-1-4842-2198-3 (ISBN)
This book is practical with examples
Hadoop and its ecosystem is one of the two most popular big data frameworks out there
This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr.
From setting up the environment to running sample applications each chapter is a practical tutorial on using a Apache Hadoop ecosystem project.
While several books on Apache Hadoop are available, most are based on the main projects MapReduce and HDFS and none discusses the other Apache Hadoop ecosystem projects and how these all work together as a cohesive big data development platform.
What you'll learn
How to set up environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5.
How to run a MapReduce job
How to store data with Apache Hive, Apache HBase
How to index data in HDFS with Apache Solr
How to develop a Kafka messaging system
How to develop a Mahout User Recommender System
How to stream Logs to HDFS with Apache Flume
How to transfer data from MySQL database to Hive, HDFS and HBase with Sqoop
How create a Hive table over Apache Solr
The primary audience is Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.
Deepak Vohra is a coder, developer, programmer, book author and technical reviewer.
Introduction1. HDFS and MapReduceHadoop Distributed FileSystemMapReduce FrameworksSetting the EnvironmentHadoop Cluster ModesRunning a MapReduce Job with MR1 FrameworkRunning MR1 in Standalone ModeRunning MR1 in Psuedo-Distributed ModeRunning MapReduce with Yarn FrameworkRunning YARN in Psuedo-Distributed ModeRunning Hadoop Streaming Section II Storing & Querying 2. Apache HiveSetting the EnvironmentConfiguring HadoopConfiguring HiveStarting HDFSStarting the Hive ServerStarting the Hive CLICreating a DatabaseUsing a DatabaseCreating a Managed TableLoading Data into a TableCreating a table using LIKEAdding Data with INSERT INTO TABLEAdding Data with INSERT OVERWRITECreating Table using AS SELECTAltering a TableTruncating a TableDropping a TableCreating an External Table 3. Apache HBase Setting the EnvironmentConfiguring HadoopConfiguring HBaseConfiguring HiveStarting HBaseStarting HBase ShellCreating a HBase TableAdding Data To HBase TableListing All TablesGetting a Row of DataScanning a TableCounting Number of Rows in a TableAltering a TableDeleting a RowDeleting a ColumnDisabling and Enabling a TableTruncating a TableDropping a TableFinding if a Table existsCreating a Hive External Table Section III Bulk Transferring & Streaming 4. Apache Sqoop Installing MySQL DatabaseCreating MySQL Database TablesSetting the EnvironmentConfiguring HadoopStarting HDFSConfiguring HiveConfiguring HBaseImporting into HDFSExporting from HDFSImporting into HiveImporting into HBase 5. Apache Flume Setting the EnvironmentConfiguring HadoopConfiguring HBaseStarting HDFSConfiguring FlumeRunning a Flume AgentConfiguring Flume for HBase SinkStreaming MySQL Log to HBase Sink Section IV Serializing 6. Apache Avro Setting the EnvironmentCreating an Avro SchemaCreating a Hive Managed TableCreating a Hive (version prior to 0.14) External Table Stored as Avro"div>Creating a Hive (version 0.14 and later) External Table Stored as AvroTransferring MySQL Table Data as Avro Data File with Sqoop 7. Apache Parquet Setting the Environment Creating a Oracle Database Table Exporting Oracle Database to a CSV File Importing the CSV File in MongoDB Exporting MongoDB Document as CSV File Importing a CSV File to Oracle Database Section V Messaging & Indexing 8. Apache Kafka Setting the EnvironmentStarting the Kafka ServerCreating a TopicStarting a Kafka ProducerStarting a Kafka ConsumerProducing and Consuming MessagesStreaming Log Data to Apache Kafka with Apache Flume Setting the Environment Creating Kafka Topics Configuring Flume"/div> Running Flume Agent Consuming Log Data as Kafka Messages 9. Apache Solr Setting the EnvironmentConfiguring the Solr SchemaStarting the Solr Server Indexing a Document in SolrDeleting a Document from Solr Indexing a Document in Solr with Java ClientSearching a Document in SolrCreating a Hive Managed TableCreating a Hive External TableLoading Hive External Table DataSearching Hive Table Data Indexed in Solr Section VI Machine Learning 10.Apache Mahout Setting the EnvironmentStarting HDFSSetting the Mahout EnvironmentRunning a Mahout Classification SampleRunning a Mahout Clustering SampleDeveloping a User Based Recommender System The Sample Data Setting the Environment Creating a Maven Project in Eclipse Creating a User Based Recommender Creating a Recommender Evaluator Running the Recommender Choosing a Recommender Type Choosing a User Similarity Measure Choosing a Neighborhood Type Choosing a Neighborhood Size for NearestNUserNeighborhood Choosing a Threshold for ThresholdUserNeighborhood Running the Evaluator Choosing the Split between Training Percentage and Test Percentage
Erscheinungsdatum | 12.10.2016 |
---|---|
Zusatzinfo | 18 black & white illustrations, 293 colour illustrations, biography |
Verlagsort | Berkeley |
Sprache | englisch |
Maße | 178 x 254 mm |
Einbandart | kartoniert |
Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
Mathematik / Informatik ► Informatik ► Netzwerke | |
Mathematik / Informatik ► Informatik ► Software Entwicklung | |
Schlagworte | Big Data • Cloud • Database • Datenbank; Administration • Framework • Hadoop • HBase |
ISBN-10 | 1-4842-2198-2 / 1484221982 |
ISBN-13 | 978-1-4842-2198-3 / 9781484221983 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich