Pro Apache Phoenix - Shakil Akhtar, Ravi Magham

Pro Apache Phoenix (eBook)

An SQL Driver for HBase

Shakil Akhtar, Ravi Magham (Autoren)

eBook Download: PDF

2016 | 1st ed.
XV, 140 Seiten
Apress (Verlag)
978-1-4842-2370-3 (ISBN)

This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds.

Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop.

You will learn how to:

Handle a petabyte data store by applying familiar SQL techniques
Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase
Apply best practices while working with a scalable data store on Hadoop and HBase
Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis
Demonstrate real-time use cases and big data modeling techniques

Who This Book Is For

Data engineers, Big Data administrators, and architects.

Ravi Mugham, an engineer passionate about data and data-driven engineering, experienced with working and scaling solutions to petabyte datasets. In his past experience, he has worked with CA Technologies, Bazaarvoice and various other startups. Actively involved in open source projects and is a PMC member to Apache Phoenix. Currently, his interests are in Distributed Data stream processing

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds. Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop.You will learn how to:Handle a petabyte data store by applying familiar SQL techniquesStore, analyze, and manipulate data in a NoSQL Hadoop echo system with HBaseApply best practices while working with a scalable data store on Hadoop and HBaseIntegrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysisDemonstrate real-time use cases and big data modeling techniquesWho This Book Is ForData engineers, Big Data administrators, and architects.

Shakil Akhtar is TOGAF 9 Certified Enterprise Architect passionate about Digital Transformation, Cloud Computing, Big Data and Internet of Things technologies. He holds many certifications including Oracle Certified Master Java Enterprise Architect (OCMJEA). He worked with Cisco, Oracle, CA Technologies and various other organizations. Where he developed and architected large-scale complex enterprise software, creating frameworks and scaling systems to petabyte datasets. He is an enthusiastic open source user and longtime fan. When not working, he can be found playing guitar and doing some jamming sessions with his friends.Ravi Mugham, an engineer passionate about data and data-driven engineering, experienced with working and scaling solutions to petabyte datasets. In his past experience, he has worked with CA Technologies, Bazaarvoice and various other startups. Actively involved in open source projects and is a PMC member to Apache Phoenix. Currently, his interests are in Distributed Data stream processing

Contents at a Glance 4
Contents 5
About the Authors 13
About the Technical Reviewers 14
Chapter 1: Introduction 15
1.1 Big Data Lake and Its Representation 16
1.2 Modern Applications and Big Data 17
1.2.1 Fraud Detection in Banking 17
1.2.2 Log Data Analysis 17
1.2.3 Recommendation Engines 18
1.2.3.1 Social Media Analysis 18
1.3 Analyzing Big Data 18
1.4 An Overview of Hadoop and MapReduce 19
1.5 Hadoop Ecosystem 19
1.5.1 HDFS 20
1.5.2 MapReduce 21
1.5.3 HBase 23
1.5.4 Hive 24
1.5.5 YARN 25
1.5.6 Spark 25
1.5.7 PIG 25
1.5.8 ZooKeeper 25
1.6 Phoenix in the Hadoop Ecosystem 26
1.7 Phoenix’s Place in Big Data Systems 26
1.8 Importance of Traditional SQL-Based Tools and the Role of Phoenix 26
1.8.1 Traditional DBA Problems for Big Data Systems- 27
1.8.2 Which Tool Should I Use for Big Data? 27
1.8.3 Massive Data Storage and Challenges 27
1.8.4 A Traditional Data Warehouse and Querying 27
1.9 Apache Phoenix in Big Data Analytics 28
1.10 Summary 28
Chapter 2: Using Phoenix 29
2.1 What is Apache Phoenix? 29
2.2 Architecture 30
2.2.1 Installing Apache Phoenix 31
2.2.2 Installing Java 31
2.2.2.1 Installing Java on Linux 31
2.2.2.2 Installing Java on Mac OS X 32
2.3 Installing HBase 32
2.4 Installing Apache Phoenix 33
2.5 Installing Phoenix on Hortonworks HDP 34
2.5.1 Downloading Hortonworks Sandbox 35
2.5.2 Start HBase 41
2.5.3 Testing Your Phoenix Installation 42
2.6 Installing Phoenix on Cloudera Hadoop 44
2.7 Capabilities 45
2.8 Hadoop Ecosystem and the Role of Phoenix 46
2.9 Brief Description of Phoenix’s Key Features 47
2.9.1 Transactions 47
2.9.2 User-Defined Functions 47
2.9.3 Secondary Indexes 48
2.9.4 Skip Scan 48
2.9.5 Views 48
2.9.6 Multi-Tenancy 48
2.9.7 Query Server 49
2.10 Summary 49
Chapter 3: CRUD with Phoenix 50
3.1 Data Types in Phoenix 50
3.1.1 Primitive Data Types 50
3.1.2 Complex Data Types 50
3.2 Data Model 51
3.2.1 Steps in data modeling 52
3.3 Phoenix Write Path 52
3.4 Phoenix Read Path 52
3.5 Basic Commands 52
3.5.1 HELP 53
3.5.2 CREATE 54
3.5.3 UPSERT 54
3.5.4 SELECT 54
3.5.5 ALTER 55
3.5.6 DELETE 55
3.5.7 DESCRIBE 55
3.5.8 LIST 56
3.6 Working with Phoenix API 56
3.6.1 Environment setup 56
3.7 Summary 62
Chapter 4: Querying Data 63
4.1 Constraints 63
4.1.1 NOT NULL 63
4.2 Creating Tables 64
4.3 Salted Tables 65
4.4 Dropping Tables 67
4.5 ALTER Tables 67
4.5.1 Adding Columns 68
4.5.2 Deleting or Replacing Columns 68
4.5.3 Renaming a Column 69
4.6 Clauses 69
4.6.1 LIMIT 69
4.6.2 WHERE 70
4.6.3 GROUP BY 70
4.6.4 HAVING 71
4.6.5 ORDER BY 71
4.7 Logical Operators 72
4.7.1 AND 72
4.7.2 OR 72
4.7.3 IN 72
4.7.4 LIKE 73
4.7.5 BETWEEN 73
4.8 Summary 73
Chapter 5: Advanced Querying 74
5.1 Joins 74
5.2 Inner Join 74
5.3 Outer Join 75
5.3.1 Left Outer Join 75
5.3.2 Right Outer Join 76
5.3.3 Full Outer Join 77
5.4 Grouped Joins 78
5.5 Hash Join 79
5.6 Sort Merge Join 80
5.7 Join Query Optimizations 80
5.7.1 Optimizing Through Configuration Properties 81
5.7.2 Optimizing Query 81
5.8 Subqueries 82
5.8.1 IN and NOT IN in Subqueries 83
5.8.2 EXISTS and NOT EXISTS Clauses 83
5.8.3 ANY, SOME, and ALL Operators with Subqueries 84
5.8.4 UPSERT Using Subqueries 84
5.9 Views 85
5.9.1 Creating Views 85
5.9.2 Dropping Views 86
5.10 Paged Queries 86
5.10.1 LIMIT and OFFSET 87
5.10.2 Row Value Constructor 87
5.11 Summary 88
Chapter 6: Transactions 89
6.1 SQL Transactions 89
6.2 Transaction Properties 89
6.2.1 Atomicity 90
6.2.2 Consistency 90
6.2.3 Isolation 90
6.2.4 Durability 90
6.3 Transaction Control 90
6.3.1 COMMIT 90
6.3.2 ROLLBACK 90
6.3.3 SAVEPOINT 91
6.3.4 SET TRANSACTION 91
6.4 Transactions in HBase 91
6.4.1 Integrating HBase with Transaction Manager 91
6.4.2 Components of Transaction Manager 92
6.4.2.1 TransactionAware Client 92
6.4.2.2 Transaction Manager 92
6.4.2.3 Transaction Processor Coprocessor 93
6.4.3 Transaction Lifecycle 94
6.4.4 Concurrency Control 94
6.4.5 Multiversion Concurrency Control 95
6.4.6 Optimistic Concurrency Control 95
6.5 Apache Tephra As a Transaction Manager 95
6.6 Phoenix Transactions 96
6.6.1 Enabling Transactions for Tables 99
6.6.2 Committing Transactions 99
6.7 Transaction Limitations in Phoenix 100
6.8 Summary 100
Chapter 7: Advanced Phoenix Concepts 101
7.1 Secondary Indexes 101
7.1.1 Global Index 102
7.1.1.1 Immutable Tables 104
7.1.1.1.1 Consistency 105
7.1.1.2 Mutable Tables 106
7.1.1.2.1 Configuration 106
7.1.1.2.2 Consistency 106
7.1.2 Local Index 106
7.1.3 Covered Index 109
7.1.4 Functional Indexes 110
7.1.5 Index Consistency 110
7.2 User Defined Functions 112
7.2.1 Writing Custom User Defined Functions 112
7.2.1.1 Configuration 115
7.2.1.2 Runtime Environment 115
7.3 Phoenix Query Server 116
7.3.1 Download 117
7.3.2 Installation 117
7.3.3 Setup 117
7.3.4 Starting PQS 117
7.3.5 Client 117
7.3.6 Usage 118
7.3.7 Additional PQS Features 119
7.3.7.1 Gotchas 119
7.4 Summary 119
Chapter 8: Integrating Phoenix with Other Frameworks 120
8.1 Hadoop Ecosystem 120
8.2 MapReduce Integration 120
8.2.1 Setup 121
8.3 Apache Spark Integration 124
8.3.1 Setup 125
8.3.2 Reading and Writing Using Dataframe 126
8.4 Apache Hive Integration 127
8.4.1 Setup 127
8.4.2 Table Creation 128
8.5 Apache Pig Integration 129
8.5.1 Setup 129
8.5.2 Accessing Data from Phoenix 129
8.5.3 Storing Data to Phoenix 129
8.6 Apache Flume Integration 130
8.6.1 Setup 130
8.6.2 Configuration 130
8.6.3 Running the Above Setup 131
8.7 Summary 131
Chapter 9: Tools & Tuning
9.1 Phoenix Tracing Server 132
9.1.1 Trace 132
9.1.2 Span 133
9.1.3 Span Receivers 133
9.1.4 Setup 133
9.1.4.1 Client Configuration 133
9.1.4.2 Server Configuration 134
9.2 Phoenix Bulk Loading 136
9.2.1 Setup 136
9.2.2 Gotchas 137
9.3 Index Load Async 138
9.4 Pherf 138
9.4.1 Setup to Run the Test 142
9.4.2 Gotchas 143
9.5 Summary 144
Index 145

Erscheint lt. Verlag	29.12.2016
Zusatzinfo	XV, 140 p. 108 illus., 95 illus. in color.
Verlagsort	Berkeley
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
Schlagworte	Advance Phoenix • Advance Querying • Apache Phoenix • Big Data Analytics • CRUD • HBase • Integrating Phoenix
ISBN-10	1-4842-2370-5 / 1484223705
ISBN-13	978-1-4842-2370-3 / 9781484223703

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 6,6 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 39,95