What’s New in Solr 8
A roll-up of the latest features in Apache Solr 8
The Lucene PMC recently announced the release of Apache Solr version 8.5. We’re several releases into the 8.x release line now, and with a cadence of a new release every 8-10 weeks, you’d be forgiven if you hadn’t kept up with the latest features.
As there have been a number of exciting changes in Solr in the past few releases, I thought it would be helpful to review a few that have been released in 8.3-8.5.
Package Manager
Solr users have long wished for a proper plugin ecosystem. For those who build their own query parsers and other types of customizations, deploying their code has been cumbersome at best. It’s also never been possible to easily share your code with others who may benefit from similar changes.
Starting in 8.4, we have a package management system in Solr that allows for hot deployment of plugins across all nodes of a cluster, secure signing of plugins from trusted remote repositories, and consistent packaging guidance for plugins. One key feature of hot deployment is the ability to add or upgrade plugins without having to manually move .jars and then restart every node of the cluster.
The Lucene/Solr community hopes to remove from Solr some features that have long lived as “contribs” (like Solr Cell, DataImportHandler, Carrot, etc.) to allow them to become plugins maintained by people passionate about them who can give them the attention they deserve. If you would be interested in taking one of these on, please send a mail to the development mailing list dev@lucene.apache.org (be sure to subscribe to the list to see replies!).
Caches
Solr 8.3 added a brand new cache implementation, CaffeineCache, which we expect to provide most users with a lower memory footprint, a higher hit ratio, and better multi-threaded performance.
CaffeineCache is based on the Caffeine caching library, which by default uses a “Window Tiny Least Frequently Used” (W-TinyLFU) eviction policy. This allows eviction based on both frequency and recency of use.
This implementation will become the default in Solr 9 and all other existing cache implementations will be removed at that time. Caching has a direct impact on the performance of your Solr installation, so it’s best to try to plan for the future by trying it out in a dev or QA environment.
Security
Any discussion of recent changes must mention the many changes made in Solr 8.4 and 8.5 to improve Solr’s default security position. Our goal has been to make Solr more secure out of the box.
My colleague Erik Hatcher covered the changes in 8.4 in his excellent post, Default Security in Solr 8.4.
In 8.5, a few more features have been added, namely the ability to run Solr with a Java Security Manager enabled, and the ability to whitelist or blacklist IP addresses or ranges from being able to access any Solr interface (UI or API).
Indexing Log Files
Solr’s logs contain a wealth of information but in a production system they can be difficult to read. There’s so much going on, it’s hard to separate the signals and the noise.
Starting with 8.5, Solr has a simple way to index its own log files into a Solr collection with a new wrapper script in the bin/ directory called postlogs. This script parses the log files and indexes them to a collection of your choice.
Once they are indexed, you can query them for errors or patterns. Visualizing the system activity with something like Apache Zeppelin when trying to diagnose a problem can be incredibly powerful – how many commits are you really doing? How slow are the queries users are complaining about? Are those outliers or evidence of a persistent problem?
There’s obviously the potential to create an infinite loop if you index Solr’s logs to itself continually. This tool is intended for troubleshooting and not for monitoring.
New Delete-by-Query Approach
Delete-by-query operations can be very expensive, particularly with distributed collections. Best practice advice is usually to avoid them in a busy production system. The reason for this is that they block all other document updates while the query is executed and the results processed. This is done to ensure out-of-order updates and optimistic concurrency constraints are properly processed. A side effect, however, is that in some cases, other updates can queue up and eventually cause replicas to go into recovery.
There are times when you may not care about preserving document order and version consistency, you just want the documents to be deleted. For example, if you want to purge the index of all documents older than 60 days, you may know that no incoming documents will have a timestamp later than today, so there is no need to block the entire indexing stream and potentially cause severe downstream consequences.
To provide an alternative, a new stream decorator delete() has been added which operates similarly to the long-existing update() decorator by wrapping a streaming expression. This decorator allows a faster delete-by-query that is non-blocking – every tuple output by the inner stream includes a document ID which can be quickly deleted from the index. As an added benefit, the full extent of streaming expression syntax is available for identifying the documents to be deleted.
Important Notes for Upgrades
There are two important changes to note if you’re upgrading to 8.4 or 8.5 from a version before 8.4.
Non-Default Codec Change (8.4)
First, if you have defined the postingsFormat or docValuesFormat parameter in any field or field type definition to a non-default codec (if you are using the Tagger Handler, for example, you would have done this), you will have to perform a bit of surgery to be able to use Solr after upgrade.
SolrCloud Overseer Queue Format Change (8.5)
Second, if you are using SolrCloud, you’ll want to take care during the upgrade to 8.5 due to a change in the format used for elements in the Overseer queues and maps. There are no configuration changes and you should otherwise notice no difference, but you’ll want to follow the suggestions in the Solr Upgrade Notes carefully for a successful upgrade.
Remove the friction of upgrades with Lucidworks Managed Search
Lucidworks offers Apache Solr as a managed service for those who want to avoid the hassle of upgrading and managing infrastructure, but still want to leverage all of Solr’s latest and greatest features. Lucidworks Managed Search is available now for preview. Learn more in my colleague Marcus Eagan’s recent blog post.
Resources
- Solr Upgrade Notes: https://lucene.apache.org/solr/guide/solr-upgrade-notes.html
- Release Notes: https://cwiki.apache.org/confluence/display/SOLR/ReleaseNote85
- Full list of changes: https://lucene.apache.org/solr/8_5_0/changes/Changes.html (Note this is a detailed issue-level list of every change. Click the link for “Older Releases” to see changes from 8.3 and 8.4.)
LEARN MORE
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.