Apache Lucene, Apache Solr, Open Source, Technischer Artikel

What’s new in Apache Solr 5.2

by Anshum Gupta
June 9, 2015

Apache Lucene and Solr 5.2.0 were just released with tons of new features, optimizations, and bug fixes. Here are the major highlights from the release:

Rule based replica assignment

This feature allows users to have fine grained control on placement of new replicas during collection, replica, and shard creation. A rule is a set of conditions, comprising of shard, replica, and a tag that must be satisfied before a replica can be created. This can be used to restrict replica creations like:

Keep less than 2 replicas of a collection on any node
For a shard, keep less than 2 replicas on any node
(Do not) Create shards on a particular rack, or host.

More details about this feature are available in this blogpost : https://de.lucidworks.com/post/rule-based-replica-assignment-solrcloud/

Restore API

So far, Solr provided with a feature to back-up an existing index using a call like:
http://localhost:8983/solr/techproducts/replication?command=backup&name=backup_name

The new restore API allows you to restore an existing back-up via a command like:
http://localhost:8983/solr/techproducts/replication?command=restore&name=backup_name

The location of the index backup defaults to the data directory but can be overriden by the location parameter.

JSON Facet API

unique() facet function

The unique facet function is now supported for numeric and date fields. Example:

json.facet={
  unique_products : "unique(product_type)"
}

The “type” parameter: flatter request

There’s now a way to construct a flatter JSON Facet request using the “type” parameter. The following request from 5.1:

top_authors : {
  terms : { 
    field:author, 
    limit:5 
  }
}

is equivalent to this request in 5.2:

top_authors : { 
  type:terms,
  field:author,
  limit:5 
}

mincount parameter and range facets

The mincount parameter is now supported by range facets to filter out ranges that don’t meet a minimum document count. Example:

prices:{ 
  type:range,
  field:price,
  mincount:1,
  start:0,
  end:100,
  gap:10
}

multi-select faceting

A new parameter, excludeTags will disregards any matching tagged filters for that facet. Example:

q=cars
&fq={!tag=COLOR}color:black
&fq={!tag=MODEL}model:xc90
&json.facet={
    colors:{type:terms, field:color, excludeTags=COLOR},
    model:{type:terms, field:model, excludeTags=MODEL} 
  }

The above example shows a request where a user selected “color:black”. This query would do the following:

Get a document list with the filter applied.
colors facet:
- Exclude the color filter so you get back facets for all colors instead of just getting the color ‘black’.
- Apply the model filter.
Similarly compute facets for the model i.e. exclude the model filter but apply the color filter.

hll facet function

The json facet API has an option to use the HyperLogLog implementation for computing unique values. Example:

json.facet={
  unique_products : "hll(product_type)"
}

Choosing facet implementation

Pre Solr 5.2, interval faceting had a different implementation than range faceting based on DocValues, which at times is faster and doesn’t rely on filters and filter-cache. Solr 5.2 has support to choose between the Filters and DocValues based implementations. Functionally, the results of the two are the same, but there could be a difference in performance. The facet.range.method parameter allows for specifying the implementation to be used. Some numbers on the performance of the two methods can be found here: https://issues.apache.org/jira/browse/SOLR-7406?focusedCommentId=14497338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14497338

Stats component

Solr stats component now has support for HyperLogLog based cardinality estimation. The same is also used by the new Json facet API. The cardinality option uses probabilistic “HyperLogLog” (HLL) algorithm to estimate the cardinality of the sets in a fixed amount of memory. It also allows for tuning the cardinality parameter, which allows you to trade off accuracy for the amount of RAM used at query time, with relatively minor impacts on response time performance.

Solr security

SolrCloud allows for hosting multiple collections within the same cluster but until 5.1, didn’t provide a mechanism to restrict access.
The authentication framework in 5.2 allows for plugging in a custom authentication plugin or using the Kerberos plugin that is shipped out of the box. This allows for authenticating requests to Solr.
The authorization framework allows for implementing a custom plugin to authorize access for resources in a SolrCloud cluster. Here’s a Solr reference guide link for the same: https://cwiki.apache.org/confluence/display/solr/Security

Solr streaming expressions

Streaming Expressions, provide a simple query language for SolrCloud that merges search with parallel computing. This builds on the Solr streaming API introduced in 5.1. The Solr reference guide has more information about the same: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

Other features

A few configurations in Solr need to be in place as part of the bootstrapping process and before the first Solr node comes up e.g. to enable SSL. The CLUSTERPROP call provides with an API to do so, but requires a running Solr instance. Starting Solr 5.2, a cluster-wide property can be added/edited/deleted using the zkcli script and doesn’t require a running Solr instance.

On the spatial front, this release introduces the new spatial RptWithGeometrySpatialField, based on CompositeSpatialStrategy, which blends RPT indexes for speed with serialized geometry for accuracy. It includes a Lucene segment based in-memory shape cache.

There is now a refactored Admin UI built using AngularJS. This new UI isn’t the default, but an optional UI interface so users could report issues and provide feedback for this to migrate and become the default UI. The new UI can be accessed at: http://hostname:port/solr/index.html

Though it’s an internal detail but it’s certainly an important one. Solr has internally been upgraded to use Jetty 9. This allows us to move towards using Async calls and more.

Indexing performance improvement

This release also comes with a substantial indexing performance improvement and bumps it up by almost 100% as compared to Solr 4x. Watch out for a blog on that real soon.

Beyond the features and improvements listed above, Solr 5.2.0 also includes many other new features as well as numerous optimizations and bugfixes of the corresponding Apache Lucene release. For more information, the detailed change log for both Lucene and Solr can be found here:

Lucene: http://lucene.apache.org/core/5_2_0/changes/Changes.html

Solr: http://lucene.apache.org/solr/5_2_0/changes/Changes.html

Featured image by David Precious

About Anshum Gupta

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.

Lucidworks-Plattform – Übersicht

Lucidworks-Plattform – Preisgestaltung

KI-Zentrum

FUNKTIONEN VON LUCIDWORKS (ALLES INKLUSIVE)

Produktentdeckung

Searchandising

Websitesuche

Suche am Arbeitsplatz

Daten aufnehmen und Signale erfassen

Sucherlebnis der Mitarbeitenden

Kundenservice und Lösung von Fällen

KI und Large Language Models

LÖSUNGEN

Commerce

Kundenservice

Wissensmanagement

BRANCHEN

B2B-Commerce und -Vertrieb

B2B-Fertigung

Einzelhandel

Regierungsbehörden und öffentlicher Sektor

Gesundheitswesen

Finanzdienstleistungen

B2B Core Package

ENTDECKEN SIE UNSERE INHALTE

E-Books und Berichte

Blog

Videos

Presse

RESSOURCEN

Über Lucidworks

Dokumentation

Karriere

LucidAcademy

Kontakt

Technischer Support