Scalable Big Data Architecture - Bahaaldine Azarmi

Scalable Big Data Architecture (eBook)

A practitioners guide to choosing relevant Big Data architecture

Bahaaldine Azarmi (Autor)

eBook Download: PDF

2015 | 1st ed.
XIII, 141 Seiten
Apress (Verlag)
978-1-4842-1326-1 (ISBN)

This book highlights the different types of data architecture and illustrates the many possibilities hidden behind the term 'Big Data', from the usage of No-SQL databases to the deployment of stream analytics architecture, machine learning, and governance.

Scalable Big Data Architecture covers real-world, concrete industry use cases that leverage complex distributed applications , which involve web applications, RESTful API, and high throughput of large amount of data stored in highly scalable No-SQL data stores such as Couchbase and Elasticsearch. This book demonstrates how data processing can be done at scale from the usage of NoSQL datastores to the combination of Big Data distribution.

When the data processing is too complex and involves different processing topology like long running jobs, stream processing, multiple data sources correlation, and machine learning, it's often necessary to delegate the load to Hadoop or Spark and use the No-SQL to serve processed data in real time.

This book shows you how to choose a relevant combination of big data technologies available within the Hadoop ecosystem. It focuses on processing long jobs, architecture, stream data patterns, log analysis, and real time analytics. Every pattern is illustrated with practical examples, which use the different open sourceprojects such as Logstash, Spark, Kafka, and so on.

Traditional data infrastructures are built for digesting and rendering data synthesis and analytics from large amount of data. This book helps you to understand why you should consider using machine learning algorithms early on in the project, before being overwhelmed by constraints imposed by dealing with the high throughput of Big data.

Scalable Big Data Architecture is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a Big Data project and which tools to integrate into that pattern.

Bahaaldine Azarmi is the co-founder and CTO of reach five, a Social Data Marketing Platform. Bahaaldine has a strong background and expertise skills in REST API and Big Data architecture. Prior to founding reach five, Bahaaldine worked as a technical architect & evangelist for large software vendors such as Oracle & Talend.He has a master’s degree of computer science from Polytech’Paris engineering school, Paris.

This book highlights the different types of data architecture and illustrates the many possibilities hidden behind the term "e;Big Data"e;, from the usage of No-SQL databases to the deployment of stream analytics architecture, machine learning, and governance.Scalable Big Data Architecture covers real-world, concrete industry use cases that leverage complex distributed applications , which involve web applications, RESTful API, and high throughput of large amount of data stored in highly scalable No-SQL data stores such as Couchbase and Elasticsearch. This book demonstrates how data processing can be done at scale from the usage of NoSQL datastores to the combination of Big Data distribution.When the data processing is too complex and involves different processing topology like long running jobs, stream processing, multiple data sources correlation, and machine learning, it s often necessary to delegate the load to Hadoop or Spark and use the No-SQLto serve processed data in real time.This book shows you how to choose a relevant combination of big data technologies available within the Hadoop ecosystem. It focuses on processing long jobs, architecture, stream data patterns, log analysis, and real time analytics. Every pattern is illustrated with practical examples, which use the different open sourceprojects such as Logstash, Spark, Kafka, and so on.Traditional data infrastructures are built for digesting and rendering data synthesis and analytics from large amount of data. This book helps you to understand why you should consider using machine learning algorithms early on in the project, before being overwhelmed by constraints imposed by dealing with the high throughput of Big data.Scalable Big Data Architecture is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a Big Data project and which tools tointegrate into that pattern.

Bahaaldine Azarmi is the co-founder and CTO of reach five, a Social Data Marketing Platform. Bahaaldine has a strong background and expertise skills in REST API and Big Data architecture. Prior to founding reach five, Bahaaldine worked as a technical architect & evangelist for large software vendors such as Oracle & Talend.He has a master’s degree of computer science from Polytech’Paris engineering school, Paris.

Chapter 1: I think I have a Big (data) Problem (20 pages)Chapter Goal: This chapter aims to introduce you to the topology of common existing limitations when it comes to dealing with large amounts of data, and what are the common solutions to those problems. The goal here is to lay down the foundation of a heterogeneous architecture that will be described in the following chapters.1- Identifying Big Data symptoms2- Understanding the Big Data projects ecosystem3- Creating the foundation of a long term Big Data architectureChapter 2: Early Big Data with No-SQL (30 pages)Chapter Goal: This chapter aims to describe how a No-SQL database can be a starting point for your Big Data project, how it can deal with large amounts of data, what are the limits of this model and how it can be scaled to a full-fledged Big Data project.1- Choosing the right No-SQL database2- Introduction to Couchbase3- Introduction to Elasticsearch4- Using No-SQL cache in a SQL based architectureChapter 3: Big Data processing jobs topology (30 pages)Chapter Goal: The more data you get, the more important it is to split the processing into different jobs depending on the topology of the processing.1- Big Data Job processing strategy2- Smart data extraction from No-SQL database3- Short term processing jobs.4- Long term processing jobs.Chapter 4: Big Data Streaming Pattern (30 pages)Chapter Goal: This chapter helps the readers to understand what are their options when it comes to dealing with streaming high data throughput.1- Identifying streaming data sources2- Streaming with Big Data projects (Flume) versus Enterprise Service Bus3- Processing architecture for stream dataChapter 5: Querying and Analysing Patterns (30 pages)Chapter Goal: In this chapter, the readers will understand how to leverage the processing work through long term & real time data querying.1- "Process then Query" strategy versus real-time querying2- Process, store and query data in Elasticsearch3- Real-Time querying using SparkChapter 6: How About Learning from your Data? (30 pages)Chapter Goal: This chapter will introduce the concept of machine learning at different level of the preceding described patterns and through different relative methodology.1- Introduction to machine learning2- Supervised and Unsupervised learning3- A simple example of Machine learning4- Using MLlib for machine learningChapter 7: Governance Considerations (20 pages)Chapter Goal: Monitoring, and more generally governance is extremely important when dealing with architecture that involves all the previous patterns. This chapter is to safeguard the reader from major issues, and to gain visibility and control over the architecture.1- Data Quality2- Architecture Scalability3- Security4- Monitoring

Erscheint lt. Verlag	31.12.2015
Zusatzinfo	XIII, 141 p. 70 illus.
Verlagsort	Berkeley
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Mathematik / Informatik ► Informatik ► Netzwerke
	Mathematik / Informatik ► Informatik ► Theorie / Studium
	Mathematik / Informatik ► Informatik ► Web / Internet
	Sozialwissenschaften ► Politik / Verwaltung ► Staat / Verwaltung
Schlagworte	Big Data • Big Data Streaming • Jobs Topology • machine learning • NoSQL • Querying and Analysing
ISBN-10	1-4842-1326-2 / 1484213262
ISBN-13	978-1-4842-1326-1 / 9781484213261

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 3,9 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 67,35