[UPDATE] Spatial Search in Apache Lucene and Solr
One of the most frequent things I get asked is „what is the state of spatial in Lucene and Solr?“ So here is my answer as of today:
- I just committed SOLR-1568 the other day, which adds automatic filter generation to the various point based Field Types in Solr. It also has some small refactoring in the underlying Lucene code. Furthermore, it adds a new LatLonType which can be used to represent latitude/longitude pairs seamlessly. See http://wiki.apache.org/solr/SpatialSearch for the full details on Solr spatial. Note, this is only available on trunk. Volunteers to backport to 3.x would be most welcome.
- As part of SOLR-1568, it became increasingly clear to me that the Cartesian Tier stuff in Lucene spatial simply does not work as intended for many, many things. In my review and attempt at fixing the code, it became more than apparent that it only really works for the Western Hemisphere above the equator, i.e. the United States. It may also work in the Eastern Hemisphere above the equator, too. The reason it only really works above the equator is due to a miscalculation in the SinusoidalProjector. See LUCENE-2519. It also does not handle edge cases well at all, such as at the poles or the Prime/Anit Meridians, so if you have that case, then don’t bother. I didn’t fix the SinusoidalProjector because it turned into a very tangled web of broken unit tests. In discussions with other developers, we decided the whole tier system (and much of Lucene’s spatial should be deprecated/replaced).
I believe trunk is now in pretty decent shape for spatial search for applications that need:
- Sorting by distance
- Boosting by distance
- Range-query (using Numeric Fields) based bounding box calculations, which should be sufficient for most people
- Geohash based calculations
Trunk does not yet have the ability to:
- add „pseudo“ fields to the result set, so it is not possible to include the distance in the result set just like other stored fields
- A tier/tile/grid based approach to filtering. These approaches are especially helpful in highly dense areas as they can significantly reduce the number of terms that need to be enumerated
- Faceting by functions, which can be useful for putting distances into buckets, as in something like: walking, biking, driving
For a list of all the related Solr/Lucene spatial issues, see SOLR-773. Again, see http://wiki.apache.org/solr/SpatialSearch for a full accounting of what is in Solr and how to use it.
In summary, I think trunk is in pretty decent shape for spatial, as far as Solr is concerned. Pure Lucene users will seem some upheaval in the coming months, but it is for the better. Testers are needed and patches are welcome. And, while the tier stuff feels like a step backward, I think it is clear to me that we have several committers along with many contributors who are very interested in seeing spatial support live and prosper.
LEARN MORE
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.