Advanced Elasticsearch 7.0: A practical guide to designing, indexing, and querying advanced distributed search engines

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Building enterprise-grade distributed applications and executing systematic search operations call for a strong understanding of Elasticsearch and expertise in using its core APIs and latest features. This book will help you master the advanced functionalities of Elasticsearch and understand how you can develop a sophisticated, real-time search engine confidently. In addition to this, you'll also learn to run machine learning jobs in Elasticsearch to speed up routine tasks. You'll get started by learning to use Elasticsearch features on Hadoop and Spark and make search results faster, thereby improving the speed of query results and enhancing the customer experience. You'll then get up to speed with performing analytics by building a metrics pipeline, defining queries, and using Kibana for intuitive visualizations that help provide decision-makers with better insights. The book will later guide you through using Logstash with examples to collect, parse, and enrich logs before indexing them in Elasticsearch. By the end of this book, you will have comprehensive knowledge of advanced topics such as Apache Spark support, machine learning using Elasticsearch and scikit-learn, and real-time analytics, along with the expertise you need to increase business productivity, perform analytics, and get the very best out of Elasticsearch.

Author(s): Wai Tak Wong
Publisher: Packt Publishing
Year: 2019

Language: English
Pages: 560

Cover
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Table of Contents
Preface
Section 1: Fundamentals and Core APIs
Chapter 1: Overview of Elasticsearch 7
Preparing your environment
Running Elasticsearch
Basic Elasticsearch configuration
Important system configuration
Talking to Elasticsearch
Using Postman to work with the Elasticsearch REST API
Elasticsearch architectural overview
Elastic Stack architecture
Elasticsearch architecture
Between the Elasticsearch index and the Lucene index
Key concepts
Mapping concepts across SQL and Elasticsearch
Mapping
Analyzer
Standard analyzer
API conventions
New features
New features to be discussed
New features with description and issue number
Breaking changes
Aggregations changes
Analysis changes
API changes
Cluster changes
Discovery changes
High-level REST client changes
Low-level REST client changes 
Indices changes
Java API changes
Mapping changes
ML changes
Packaging changes
Search changes
Query DSL changes
Settings changes
Scripting changes
Migration between versions
Summary
Chapter 2: Index APIs
Index management APIs
Basic CRUD APIs
Index settings
Index templates
Index aliases
Reindexing with zero downtime
Grouping multiple indices
Views on a subset of documents
Miscellaneous
Monitoring indices
Indices stats
Indices segments, recovery, and share stores
Index persistence
Advanced index management APIs
Split index 
Shrink index 
Rollover index 
Summary
Chapter 3: Document APIs
The Elasticsearch document life cycle
What is a document?
The document life cycle
Single document management APIs
Sample documents
Indexing a document
Retrieving a document by identifier
Updating a document
Removing a document by identifier
Multi-document management APIs
Retrieving multiple documents
Bulk API
Update by query API
Delete by query API
Reindex API
Copying documents
Migration from a multiple mapping types index
Summary
Chapter 4: Mapping APIs
Dynamic mapping
Mapping rules
Dynamic templates
Meta fields in mapping
Field datatypes
Static mapping for the sample document 
Mapping parameters
Refreshing mapping changes for static mapping
Typeless APIs working with old custom index types
Summary
Chapter 5: Anatomy of an Analyzer
An analyzer's components
Character filters
The html_strip filter
The mapping filter
The pattern_replace filter
Tokenizers
Token filters
Built-in analyzers
Custom analyzers
Normalizers
Summary
Chapter 6: Search APIs
Indexing sample documents
Search APIs
URI search
Request body search
The sort parameter
The scroll parameter
The search_after parameter
The rescore parameter
The _name parameter
The collapse parameter
The highlighting parameter
Other search parameters
Query DSL
Full text queries
The match keyword
The query string keyword
The intervals keyword
Term-level queries
Compound queries
The script query
The multi-search API
Other search-related APIs
The _explain API
The _validate API
The _count API
The field capabilities API
Profiler
Suggesters
Summary
Section 2: Data Modeling, Aggregations Framework, Pipeline, and Data Analytics
Chapter 7: Modeling Your Data in the Real World
The Investor Exchange Cloud
Modeling data and the approaches
Data denormalization
Using an array of objects datatype
Nested object mapping datatypes
Join datatypes
Parent ID query
has_child query
has_parent query
Practical considerations
Summary
Chapter 8: Aggregation Frameworks
ETF historical data preparation
Aggregation query syntax
Matrix aggregations
Matrix stats
Metrics aggregations
avg
weighted_avg
cardinality
value_count
sum
min
max
stats
extended_stats
top_hit
percentiles
percentile_ranks
median_absolute_deviation
geo_bound
geo_centroid
scripted_metric
Bucket aggregations
histogram
date_histogram
auto_date_histogram
ranges
date_range
ip_range
filter
filters
term
significant_terms
significant_text
sampler
diversified_sampler
nested
reverse_nested
global
missing
composite
adjacency_matrix
parent
children
geo_distance
geohash_grid
geotile_grid
Pipeline aggregations
Sibling family
avg_bucket 
max_bucket
min_bucket
sum_bucket
stats_bucket
extended_stats_bucket
percentiles_bucket
Parent family
cumulative_sum
derivative
bucket_script
bucket_selector
bucket_sort
serial_diff
Moving average aggregation
simple
linear
ewma
holt
holt_winters
Moving function aggregation
max
min
sum
stdDev
unweightedAvg
linearWeightedAvg
ewma
holt
holtWinters
Post filter on aggregations
Summary
Chapter 9: Preprocessing Documents in Ingest Pipelines
Ingest APIs
Accessing data in pipelines
Processors
Conditional execution in pipelines
Handling failures in pipelines
Summary
Chapter 10: Using Elasticsearch for Exploratory Data Analysis
Business analytics
Operational data analytics
Sentiment analysis
Summary
Section 3: Programming with the Elasticsearch Client
Chapter 11: Elasticsearch from Java Programming
Overview of Elasticsearch Java REST client
The Java low-level REST client
The Java low-level REST client workflow
REST client initialization
Performing requests using a REST client 
Handing responses
Testing with Swagger UI
New features
The Java high-level REST client
The Java high-level REST client workflow
REST client initialization
Performing requests using the REST client
Handling responses
Testing with Swagger UI
New features
Spring Data Elasticsearch
Summary
Chapter 12: Elasticsearch from Python Programming
Overview of the Elasticsearch Python client
The Python low-level Elasticsearch client
Workflow for the Python low-level Elasticsearch client
Client initialization
Performing requests
Handling responses
The Python high-level Elasticsearch library
Illustrating the programming concept
Initializing a connection
Performing requests 
Handling responses
The query class 
The aggregations class
Summary
Section 4: Elastic Stack
Chapter 13: Using Kibana, Logstash, and Beats
Overview of the Elastic Stack
Running the Elastic Stack with Docker
Running Elasticsearch in a Docker container
Running Kibana in a Docker container
Running Logstash in a Docker container
Running Beats in a Docker container
Summary
Chapter 14: Working with Elasticsearch SQL
 Overview
Getting started
Elasticsearch SQL language
Reserved keywords
Data type
Operators
Functions
Aggregate
Grouping
Date-time
Full-text search 
Mathematics
String
Type conversion
Conditional
System
Elasticsearch SQL query syntax
New features
Elasticsearch SQL REST API
Elasticsearch SQL JDBC
Upgrading Elasticsearch from a basic to a trial license
Workflow of Elasticsearch SQL JDBC 
Testing with Swagger UI
Summary
Chapter 15: Working with Elasticsearch Analysis Plugins
What are Elasticsearch plugins?
Plugin management
Working with the ICU Analysis plugin
Examples
Working with the Smart Chinese Analysis plugin
Examples
Working with the IK Analysis plugin
Examples
Configuring a custom dictionary in the IK Analysis plugin
Summary
Section 5: Advanced Features
Chapter 16: Machine Learning with Elasticsearch
Machine learning with Elastic Stack
Machine learning APIs
Machine learning jobs
Sample data
Running a single-metric job
Creating index patterns
Creating a new machine learning job
Examining the result
Machine learning using Elasticsearch and scikit-learn
Summary
Chapter 17: Spark and Elasticsearch for Real-Time Analytics
Overview of ES-Hadoop
Apache Spark support
Real-time analytics using Elasticsearch and Apache Spark
Building a virtual environment to run the sample ES-Hadoop project
Running the sample ES-Hadoop project
Running the sample ES-Hadoop project using a prepared Docker image
Source code
Summary
Chapter 18: Building Analytics RESTful Services
Building a RESTful web service with Spring Boot
Project program structure
Running the program and examining the APIs
Main workflow anatomy
Building the analytic model
Performing daily update data
Getting the registered symbols
Building the scheduler
Integration with the Bollinger Band
Building a Java Spark ML module for k-means anomaly detection
Source code
Testing Analytics RESTful services
Testing the build-analytics-model API
Testing the get-register-symbols API
Working with Kibana to visualize the analytics results
Summary
Other Books You May Enjoy
Index