Apache Lucene, Apache Solr, Blog, Open Source, SearchHub, Tutorials und Dokumentation

A Short Introduction to Indexing / Search using Lucene

by Tim Casey
Februar 25, 2013

At times I find I need an indexing tool to do something akin to an embedded database. This is an embedded index. This comes up when trying to run filters over data in a large visual table, or over some other visualization.

From a coding point of view, the initial attempt at filtering might look something like this:

public List<String> filter(String userFilterText) {
        List<String> ret = new LinkedList<String>();
        for( Entity e : entities ) {
                if( e.containsText(userFilterString) {
                         ret.add(e.getEntityId());
                }
        }
        return ret;
}

At some point the number of rows, or data elements, exceeds the ability to respond to the user request in a timely manner. Even with trying to collect the data and put it into some memory structure, eventually this will break down in some manner.

The solution is to build an index, embedded into the application, which manages the filtering. This means filtering becomes:

public List<String> filter(String userFilterText) {
        List<String> ret = index.query(userFilterText);
        return ret;
}

Although for a small number of items this is a bit slower, it is never really slow enough to impede user perspective. That is, if there is a lot of stuff, there will be an expectation of something being slightly slower and this is acceptable. In addition, the filter now has a way to filter by field instead of just using something like String.contains() or even regular expressions.

Building one of these indexes is quite simple. You add data with Document.add(Field). You query with searcher.search(Query, Collector). It is really just that simple. A fairly useful module can be had for less than 1000 lines of code.

The class IndexProvider.java is at the heart of the example. You call IndexProvider.index(data) for every object you have to index. And then you can call IndexProvider.search(String) to query over the built up index.

The entry point is Example.main() and has one artificial requirement. The first time the example is run, it will create a directory named index and index example.csv. The second time it is run, it will run a query for ‚the‘ over the content.

Other, more complicated, queries are possible. To get all of the Lorem text

ut eu

To get a specific field,

+ut +f1:two

This allows the visualization filtering to be as rich as any query. And, more importantly, the filtering can be tied to what ever the data happens to be without any code changes involved.

Click here ->lucene-starter to download a .tgz file with a pom and sources.

About Tim Casey

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.

Lucidworks-Plattform – Übersicht

Lucidworks-Plattform – Preisgestaltung

KI-Zentrum

FUNKTIONEN VON LUCIDWORKS (ALLES INKLUSIVE)

Produktentdeckung

Searchandising

Websitesuche

Suche am Arbeitsplatz

Daten aufnehmen und Signale erfassen

Sucherlebnis der Mitarbeitenden

Kundenservice und Lösung von Fällen

KI und Large Language Models

LÖSUNGEN

Commerce

Kundenservice

Wissensmanagement

BRANCHEN

B2B-Commerce und -Vertrieb

B2B-Fertigung

Einzelhandel

Regierungsbehörden und öffentlicher Sektor

Gesundheitswesen

Finanzdienstleistungen

ENTDECKEN SIE UNSERE INHALTE

E-Books und Berichte

Blog

Videos

Presse

RESSOURCEN

Über Lucidworks

Dokumentation

Karriere

LucidAcademy

Kontakt

Technischer Support

A Short Introduction to Indexing / Search using Lucene

About Tim Casey

LEARN MORE

Lucidworks-Plattform – Übersicht

Lucidworks-Plattform – Preisgestaltung

KI-Zentrum

FUNKTIONEN VON LUCIDWORKS (ALLES INKLUSIVE)

Produktentdeckung

Searchandising

Websitesuche

Suche am Arbeitsplatz

Daten aufnehmen und Signale erfassen

Sucherlebnis der Mitarbeitenden

Kundenservice und Lösung von Fällen

KI und Large Language Models

LÖSUNGEN

Commerce

Kundenservice

Wissensmanagement

BRANCHEN

B2B-Commerce und -Vertrieb

B2B-Fertigung

Einzelhandel

Regierungsbehörden und öffentlicher Sektor

Gesundheitswesen

Finanzdienstleistungen

ENTDECKEN SIE UNSERE INHALTE

E-Books und Berichte

Blog

Videos

Presse

RESSOURCEN

Über Lucidworks

Dokumentation

Karriere

LucidAcademy

Kontakt

Technischer Support

About Tim Casey

Related Articles

LEARN MORE