Learning Apache Spark 2 (eBook)
356 Seiten
Packt Publishing (Verlag)
978-1-78588-958-5 (ISBN)
Apache Spark has seen an unprecedented growth in terms of its adoption over the last few years, mainly because of its speed, diversity and real-time data processing capabilities. It has quickly become the preferred choice of tool for many Big Data professionals looking to find quick insights from large chunks of data. This book introduces you to the Apache Spark framework, and familiarizes you with all the latest features and capabilities introduced in Spark 2.
Starting with a detailed introduction to Spark's architecture and the installation procedure, this book covers everything you need to know about the Spark framework in the most practical manner. You will learn how to perform the basic ETL activities using Spark, and work with different components of Spark such as Spark SQL, as well as the Dataset and DataFrame APIs for manipulating your data. Then, you will perform machine learning using Spark MLlib, as well as perform streaming analytics and graph processing using the Spark Streaming and GraphX modules respectively. The book also gives special emphasis on deploying your Spark models, and how they can be operated in a clustered mode.
During the course of the book, you will come across implementations of different real-world use-cases and examples, giving you the hands-on knowledge you need to use Apache Spark in the best possible manner.
Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analyticsAbout This BookExclusive guide that covers how to get up and running with fast data processing using Apache SparkExplore and exploit various possibilities with Apache Spark using real-world use cases in this bookWant to perform efficient data processing at real time? This book will be your one-stop solution.Who This Book Is ForThis guide appeals to big data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful.The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science with no prior Spark experience and want to understand how Spark can help them on their analytics journey.What You Will LearnGet an overview of big data analytics and its importance for organizations and data professionalsDelve into Spark to see how it is different from existing processing platformsUnderstand the intricacies of various file formats, and how to process them with Apache Spark.Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager.Learn the concepts of Spark SQL, SchemaRDD, Caching and working with Hive and Parquet file formatsUnderstand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark.Introduce yourself to the deployment and usage of SparkR.Walk through the importance of Graph computation and the graph processing systems available in the marketCheck the real world example of Spark by building a recommendation engine with Spark using ALS.Use a Telco data set, to predict customer churn using Random Forests.In DetailSpark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.Once we understand the individual components, we will take a couple of real life advanced analytics examples such as 'Building a Recommendation system', 'Predicting customer churn' and so on.The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.Style and approachWith the help of practical examples and real-world use cases, this guide will take you from scratch to building efficient data applications using Apache Spark.You will learn all about this excellent data processing engine in a step-by-step manner, taking one aspect of it at a time.This highly practical guide will include how to work with data pipelines, dataframes, clustering, SparkSQL, parallel programming, and such insightful topics with the help of real-world use cases.
Erscheint lt. Verlag | 28.3.2017 |
---|---|
Sprache | englisch |
Themenwelt | Sachbuch/Ratgeber ► Freizeit / Hobby ► Sammeln / Sammlerkataloge |
Informatik ► Datenbanken ► Data Warehouse / Data Mining | |
ISBN-10 | 1-78588-958-3 / 1785889583 |
ISBN-13 | 978-1-78588-958-5 / 9781785889585 |
Haben Sie eine Frage zum Produkt? |
Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich