What’s new in Apache Solr 5.2
Apache Lucene and Solr 5.2.0 were just released with tons of new features, optimizations, and bug fixes. Here are the major highlights from the release:
Rule based replica assignment
This feature allows users to have fine grained control on placement of new replicas during collection, replica, and shard creation. A rule is a set of conditions, comprising of shard, replica, and a tag that must be satisfied before a replica can be created. This can be used to restrict replica creations like:
- Keep less than 2 replicas of a collection on any node
- For a shard, keep less than 2 replicas on any node
- (Do not) Create shards on a particular rack, or host.
More details about this feature are available in this blogpost : https://de.lucidworks.com/post/rule-based-replica-assignment-solrcloud/
Restore API
So far, Solr provided with a feature to back-up an existing index using a call like:
http://localhost:8983/solr/techproducts/replication?command=backup&name=backup_name
The new restore API allows you to restore an existing back-up via a command like:
http://localhost:8983/solr/techproducts/replication?command=restore&name=backup_name
The location of the index backup defaults to the data directory but can be overriden by the location parameter.
JSON Facet API
unique() facet function
The unique facet function is now supported for numeric and date fields. Example:
json.facet={ unique_products : "unique(product_type)" }
The “type” parameter: flatter request
There’s now a way to construct a flatter JSON Facet request using the “type” parameter. The following request from 5.1:
top_authors : { terms : { field:author, limit:5 } }
is equivalent to this request in 5.2:
top_authors : { type:terms, field:author, limit:5 }
mincount parameter and range facets
The mincount parameter is now supported by range facets to filter out ranges that don’t meet a minimum document count. Example:
prices:{ type:range, field:price, mincount:1, start:0, end:100, gap:10 }
multi-select faceting
A new parameter, excludeTags will disregards any matching tagged filters for that facet. Example:
q=cars &fq={!tag=COLOR}color:black &fq={!tag=MODEL}model:xc90 &json.facet={ colors:{type:terms, field:color, excludeTags=COLOR}, model:{type:terms, field:model, excludeTags=MODEL} }
The above example shows a request where a user selected “color:black”. This query would do the following:
- Get a document list with the filter applied.
- colors facet:
- Exclude the color filter so you get back facets for all colors instead of just getting the color ‘black’.
- Apply the model filter.
- Similarly compute facets for the model i.e. exclude the model filter but apply the color filter.
hll facet function
The json facet API has an option to use the HyperLogLog implementation for computing unique values. Example:
json.facet={ unique_products : "hll(product_type)" }
Choosing facet implementation
Pre Solr 5.2, interval faceting had a different implementation than range faceting based on DocValues, which at times is faster and doesn’t rely on filters and filter-cache. Solr 5.2 has support to choose between the Filters and DocValues based implementations. Functionally, the results of the two are the same, but there could be a difference in performance. The facet.range.method parameter allows for specifying the implementation to be used. Some numbers on the performance of the two methods can be found here: https://issues.apache.org/jira/browse/SOLR-7406?focusedCommentId=14497338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14497338
Stats component
Solr stats component now has support for HyperLogLog based cardinality estimation. The same is also used by the new Json facet API. The cardinality option uses probabilistic “HyperLogLog” (HLL) algorithm to estimate the cardinality of the sets in a fixed amount of memory. It also allows for tuning the cardinality parameter, which allows you to trade off accuracy for the amount of RAM used at query time, with relatively minor impacts on response time performance.
Solr security
SolrCloud allows for hosting multiple collections within the same cluster but until 5.1, didn’t provide a mechanism to restrict access.
The authentication framework in 5.2 allows for plugging in a custom authentication plugin or using the Kerberos plugin that is shipped out of the box. This allows for authenticating requests to Solr.
The authorization framework allows for implementing a custom plugin to authorize access for resources in a SolrCloud cluster. Here’s a Solr reference guide link for the same: https://cwiki.apache.org/confluence/display/solr/Security
Solr streaming expressions
Streaming Expressions, provide a simple query language for SolrCloud that merges search with parallel computing. This builds on the Solr streaming API introduced in 5.1. The Solr reference guide has more information about the same: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
Other features
A few configurations in Solr need to be in place as part of the bootstrapping process and before the first Solr node comes up e.g. to enable SSL. The CLUSTERPROP call provides with an API to do so, but requires a running Solr instance. Starting Solr 5.2, a cluster-wide property can be added/edited/deleted using the zkcli script and doesn’t require a running Solr instance.
On the spatial front, this release introduces the new spatial RptWithGeometrySpatialField, based on CompositeSpatialStrategy, which blends RPT indexes for speed with serialized geometry for accuracy. It includes a Lucene segment based in-memory shape cache.
There is now a refactored Admin UI built using AngularJS. This new UI isn’t the default, but an optional UI interface so users could report issues and provide feedback for this to migrate and become the default UI. The new UI can be accessed at: http://hostname:port/solr/index.html
Though it’s an internal detail but it’s certainly an important one. Solr has internally been upgraded to use Jetty 9. This allows us to move towards using Async calls and more.
Indexing performance improvement
This release also comes with a substantial indexing performance improvement and bumps it up by almost 100% as compared to Solr 4x. Watch out for a blog on that real soon.
Beyond the features and improvements listed above, Solr 5.2.0 also includes many other new features as well as numerous optimizations and bugfixes of the corresponding Apache Lucene release. For more information, the detailed change log for both Lucene and Solr can be found here:
Lucene: http://lucene.apache.org/core/5_2_0/changes/Changes.html
Solr: http://lucene.apache.org/solr/5_2_0/changes/Changes.html
Featured image by David Precious
LEARN MORE
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.