Apache Solr High Performance

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In setting up Apache Solr, you'll want to ensure it's achieving optimum search results with maximum efficiency. This book shows you just how to achieve that with a comprehensive tutorial including troubleshooting.

About This Book

  • Achieve high scores by boosting query time and index time, implementing boost queries and functions using the Dismax query parser and formulae.
  • Set up and use SolrCloud for distributed indexing and searching, and implement distributed search using Shards
  • Use GeoSpatial search, handling homophones, and ignoring listed words from being indexed and searched

Who This Book Is For

This book is ideal for Apache Solr developers who want to learn different techniques to optimize Solr's performance with utmost efficiency, along with effectively troubleshooting the problems that usually occur while trying to boost performance. Familiarity with search servers and database querying is expected.

What You Will Learn

  • Boost your search based on scores, the DisMax query parser, and function queries.
  • Explore performance metrics along with implementing different Solr caching like Document, query result, filter, and whole result page caching.
  • Index and search across shards and near real-time searching.
  • Get to grips with additional performance optimization activities like fetching documents similar to the ones queried, searching homophones, or filtering searches on the basis of specific key words.
  • Troubleshoot the common problems like corrupt and locked indexes, memory, expensive garbage collection, and infinite loop exception when using multiple server environment efficiently
  • Set up, configure, and deploy various applications of ZooKeeper to optimize Solr's performance

In Detail

Apache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techniques that focus on optimizing the performance of your Solr instances and also troubleshoot issues that are prone to arise while maintaining Solr.

Apache Solr High Performance is a practical guide that will help you explore and take full advantage of the robust nature of Apache Solr so as to achieve optimized Solr instances, especially in terms of performance.

You will learn everything you need to know in order to achieve a high performing Solr instance or set of instances, as well as how to troubleshoot the common problems you are prone to face while working with single or multiple Solr servers.

This book offers you an introduction by explaining the prerequisites of Apache Solr and installing it, while also integrating it with the required additional components, and gradually progresses into features that make Solr flexible enough to achieve high performance ratings in various circumstances. Moving forward, the book will cover several clear and highly practical concepts that will help you further optimize your Solr instances' performance both on single as well as multiple servers, and learn how to troubleshoot common problems that are prone to arise while using your Solr instance. By the end of the book you will also learn how to set up, configure, and deploy ZooKeeper along with learning more about other applications of ZooKeeper.

You will also learn how to handle data in multiple server environments, searches based on specific geographical co-ordinates, different caching techniques, and various algorithms and formulae that enable better performance; and many more.

Author(s): Mohan, Surendra
Series: Community experience distilled
Publisher: Packt Publishing
Year: 2014

Language: English
Pages: 109
City: Birmingham
Tags: Search engines;Programming.;Lucene (Electronic resource);Open source software.;Client/server computing.;Data mining.;Web search engines.;LANGUAGE ARTS & DISCIPLINES;Library & Information Science;General.

Content: Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Installing Solr
Prerequisites
Installing components
Summary
Chapter 2: Boost Your Search
Scoring
Boosting query-time and index-time
Index-time boosting
Query-time boosting
Troubleshoot queries and scores
The dismax query parser
Lucene DisjunctionMaxQuery
Autophrase boosting
Configuring autophrase boosting
Configuring the phrase slop
Boosting a partial phrase
Boost queries
Boost functions
Boost addition and multiplication
Function queries. Field referencesFunction references
Mathematical operations
The ord() and rord() functions
Other functions
Boosting the function query
Logarithm
Reciprocal
Linear
Inverse reciprocal
Summary
Chapter 3: Performance Optimization
Solr performance factors
Solr caching
Document caching
Query result caching
Filter caching
Result pages caching
Using SolrCloud
Creating a SolrCloud cluster
Multiple collections within a cluster
Managing a SolrCloud cluster
Distributed indexing and searching
Stopping automatic document distribution
Near real-time search
Summary. Chapter 4: Additional Performance Optimization TechniquesDocuments similar to those returned in the search result
Sorting results by function values
Searching for homophones
Ignore the defined words from being searched
Summary
Chapter 5: Troubleshooting
Dealing with the corrupt index
Reducing the file count in the index
Dealing with the locked index
Truncating the index size
Dealing with a huge count of open files
Dealing with out-of-memory issues
Dealing with an infinite loop exception in shards
Dealing with expensive garbage collection. Bulk updating a single field without full indexationSummary
Chapter 6: Performance Optimization with ZooKeeper
Getting familiar with ZooKeeper
Prerequisites for a distributed server
Aid your distributed system using ZooKeeper
Setting an ideal node count for ZooKeeper
Setting up, configuring, and deploying ZooKeeper
Setting up ZooKeeper
Configuring Zookeeper
Deploying ZooKeeper
Applications of ZooKeeper
Summary
Appendix
Index.