Elasticsearch

A distributed real-time search and analytics engine. The Definitive Guide

Clinton Gormley, Zachary Tong (Autoren)

Buch | Softcover

724 Seiten

2015
O'Reilly Media (Verlag)
978-1-4493-5854-9 (ISBN)

Artikel merken

Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.

If you’re a newcomer to both search and distributed systems, you’ll quickly learn how to integrate Elasticsearch into your application. More experienced users will pick up lots of advanced techniques. Throughout the book, you’ll follow a problem-based approach to learn why, when, and how to use Elasticsearch features.

Understand how Elasticsearch interprets data in your documents
Index and query your data to take advantage of search concepts such as relevance and word proximity
Handle human language through the effective use of analyzers and queries
Summarize and group data to show overall trends, with aggregations and analytics
Use geo-points and geo-shapes—Elasticsearch’s approaches to geolocation
Model your data to take advantage of Elasticsearch’s horizontal scalability
Learn how to configure and monitor your cluster in production

Clinton Gormley was the first user of Elasticsearch and wrote the Perl API back in 2010. When Elasticsearch formed a company in 2012, he joined as a developer and the maintainer of the Perl modules. Now Clinton spends a lot of his time designing the user interfaces and speaking and writing about Elasticsearch. He studied medicine at UCT in Cape Town and lives in Barcelona.

Zachary Tong has been working with Elasticsearch since 2011. During that time, he has written a number of tutorials to help beginners start using Elasticsearch. Zach is now a developer at Elasticsearch and maintains the PHP client, gives trainings and helps customers manage clusters in production. He studied biology at Rensselaer Polytechnic Institute and now lives in South Carolina.

Getting Started
Chapter 1You Know, for Search…
Installing Elasticsearch
Running Elasticsearch
Talking to Elasticsearch
Document Oriented
Finding Your Feet
Indexing Employee Documents
Retrieving a Document
Search Lite
Search with Query DSL
More-Complicated Searches
Full-Text Search
Phrase Search
Highlighting Our Searches
Analytics
Tutorial Conclusion
Distributed Nature
Next Steps
Chapter 2Life Inside a Cluster
An Empty Cluster
Cluster Health
Add an Index
Add Failover
Scale Horizontally
Coping with Failure
Chapter 3Data In, Data Out
What Is a Document?
Document Metadata
Indexing a Document
Retrieving a Document
Checking Whether a Document Exists
Updating a Whole Document
Creating a New Document
Deleting a Document
Dealing with Conflicts
Optimistic Concurrency Control
Partial Updates to Documents
Retrieving Multiple Documents
Cheaper in Bulk
Chapter 4Distributed Document Store
Routing a Document to a Shard
How Primary and Replica Shards Interact
Creating, Indexing, and Deleting a Document
Retrieving a Document
Partial Updates to a Document
Multidocument Patterns
Chapter 5Searching—The Basic Tools
The Empty Search
Multi-index, Multitype
Pagination
Search Lite
Chapter 6Mapping and Analysis
Exact Values Versus Full Text
Inverted Index
Analysis and Analyzers
Mapping
Complex Core Field Types
Chapter 7Full-Body Search
Empty Search
Query DSL
Queries and Filters
Most Important Queries and Filters
Combining Queries with Filters
Validating Queries
Chapter 8Sorting and Relevance
Sorting
String Sorting and Multifields
What Is Relevance?
Fielddata
Chapter 9Distributed Search Execution
Query Phase
Fetch Phase
Search Options
scan and scroll
Chapter 10Index Management
Creating an Index
Deleting an Index
Index Settings
Configuring Analyzers
Custom Analyzers
Types and Mappings
The Root Object
Dynamic Mapping
Customizing Dynamic Mapping
Default Mapping
Reindexing Your Data
Index Aliases and Zero Downtime
Chapter 11Inside a Shard
Making Text Searchable
Dynamically Updatable Indices
Near Real-Time Search
Making Changes Persistent
Segment Merging
Search in Depth
Chapter 12Structured Search
Finding Exact Values
Combining Filters
Finding Multiple Exact Values
Ranges
Dealing with Null Values
All About Caching
Filter Order
Chapter 13Full-Text Search
Term-Based Versus Full-Text
The match Query
Multiword Queries
Combining Queries
How match Uses bool
Boosting Query Clauses
Controlling Analysis
Relevance Is Broken!
Chapter 14Multifield Search
Multiple Query Strings
Single Query String
Best Fields
Tuning Best Fields Queries
multi_match Query
Most Fields
Cross-fields Entity Search
Field-Centric Queries
Custom _all Fields
cross-fields Queries
Exact-Value Fields
Chapter 15Proximity Matching
Phrase Matching
Mixing It Up
Multivalue Fields
Closer Is Better
Proximity for Relevance
Improving Performance
Finding Associated Words
Chapter 16Partial Matching
Postcodes and Structured Data
prefix Query
wildcard and regexp Queries
Query-Time Search-as-You-Type
Index-Time Optimizations
Ngrams for Partial Matching
Index-Time Search-as-You-Type
Ngrams for Compound Words
Chapter 17Controlling Relevance
Theory Behind Relevance Scoring
Lucene’s Practical Scoring Function
Query-Time Boosting
Manipulating Relevance with Query Structure
Not Quite Not
Ignoring TF/IDF
function_score Query
Boosting by Popularity
Boosting Filtered Subsets
Random Scoring
The Closer, The Better
Understanding the price Clause
Scoring with Scripts
Pluggable Similarity Algorithms
Changing Similarities
Relevance Tuning Is the Last 10%
Dealing with Human Language
Chapter 18Getting Started with Languages
Using Language Analyzers
Configuring Language Analyzers
Pitfalls of Mixing Languages
One Language per Document
One Language per Field
Mixed-Language Fields
Chapter 19Identifying Words
standard Analyzer
standard Tokenizer
Installing the ICU Plug-in
icu_tokenizer
Tidying Up Input Text
Chapter 20Normalizing Tokens
In That Case
You Have an Accent
Living in a Unicode World
Unicode Case Folding
Unicode Character Folding
Sorting and Collations
Chapter 21Reducing Words to Their Root Form
Algorithmic Stemmers
Dictionary Stemmers
Hunspell Stemmer
Choosing a Stemmer
Controlling Stemming
Stemming in situ
Chapter 22Stopwords: Performance Versus Precision
Pros and Cons of Stopwords
Using Stopwords
Stopwords and Performance
Divide and Conquer
Stopwords and Phrase Queries
common_grams Token Filter
Stopwords and Relevance
Chapter 23Synonyms
Using Synonyms
Formatting Synonyms
Expand or contract
Synonyms and The Analysis Chain
Multiword Synonyms and Phrase Queries
Symbol Synonyms
Chapter 24Typoes and Mispelings
Fuzziness
Fuzzy Query
Fuzzy match Query
Scoring Fuzziness
Phonetic Matching
Aggregations
Chapter 25High-Level Concepts
Buckets
Metrics
Combining the Two
Chapter 26Aggregation Test-Drive
Adding a Metric to the Mix
Buckets Inside Buckets
One Final Modification
Chapter 27Building Bar Charts
Chapter 28Looking at Time
Returning Empty Buckets
Extended Example
The Sky’s the Limit
Chapter 29Scoping Aggregations
Chapter 30Filtering Queries and Aggregations
Filtered Query
Filter Bucket
Post Filter
Recap
Chapter 31Sorting Multivalue Buckets
Intrinsic Sorts
Sorting by a Metric
Sorting Based on “Deep” Metrics
Chapter 32Approximate Aggregations
Finding Distinct Counts
Calculating Percentiles
Chapter 33Significant Terms
significant_terms Demo
Chapter 34Controlling Memory Use and Latency
Fielddata
Aggregations and Analysis
Limiting Memory Usage
Fielddata Filtering
Doc Values
Preloading Fielddata
Preventing Combinatorial Explosions
Chapter 35Closing Thoughts
Geolocation
Chapter 36Geo-Points
Lat/Lon Formats
Filtering by Geo-Point
geo_bounding_box Filter
geo_distance Filter
Caching geo-filters
Reducing Memory Usage
Sorting by Distance
Chapter 37Geohashes
Mapping Geohashes
geohash_cell Filter
Chapter 38Geo-aggregations
geo_distance Aggregation
geohash_grid Aggregation
geo_bounds Aggregation
Chapter 39Geo-shapes
Mapping geo-shapes
Indexing geo-shapes
Querying geo-shapes
Querying with Indexed Shapes
Geo-shape Filters and Caching
Modeling Your Data
Chapter 40Handling Relationships
Application-side Joins
Denormalizing Your Data
Field Collapsing
Denormalization and Concurrency
Solving Concurrency Issues
Chapter 41Nested Objects
Nested Object Mapping
Querying a Nested Object
Sorting by Nested Fields
Nested Aggregations
Chapter 42Parent-Child Relationship
Parent-Child Mapping
Indexing Parents and Children
Finding Parents by Their Children
Finding Children by Their Parents
Children Aggregation
Grandparents and Grandchildren
Practical Considerations
Chapter 43Designing for Scale
The Unit of Scale
Shard Overallocation
Kagillion Shards
Capacity Planning
Replica Shards
Multiple Indices
Time-Based Data
Index Templates
Retiring Data
User-Based Data
Shared Index
Faking Index per User with Aliases
One Big User
Scale Is Not Infinite
Administration, Monitoring, and Deployment
Chapter 44Monitoring
Marvel for Monitoring
Cluster Health
Monitoring Individual Nodes
Cluster Stats
Index Stats
Pending Tasks
cat API
Chapter 45Production Deployment
Hardware
Java Virtual Machine
Transport Client Versus Node Client
Configuration Management
Important Configuration Changes
Don’t Touch These Settings!
Heap: Sizing and Swapping
File Descriptors and MMap
Revisit This List Before Production
Chapter 46Post-Deployment
Changing Settings Dynamically
Logging
Indexing Performance Tips
Rolling Restarts
Backing Up Your Cluster
Restoring from a Snapshot
Clusters Are Living, Breathing Creatures

Erscheint lt. Verlag	3.3.2015
Verlagsort	Sebastopol
Sprache	englisch
Maße	178 x 233 mm
Gewicht	1213 g
Einbandart	Paperback
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
Themenwelt	Mathematik / Informatik ► Informatik ► Web / Internet
Schlagworte	Elasticsearch
ISBN-10	1-4493-5854-3 / 1449358543
ISBN-13	978-1-4493-5854-9 / 9781449358549
Zustand	Neuware