Making Fusion Work for You: A Custom JSON ObjectMapper in a JavaScript Stage

by Kevin Cowan
January 31, 2017

“It is not down in any map; true places never are.”

— Herman Melville

Out-of-the-box, Fusion contains a wealth of mapping capabilities. In some cases, however, we may need to create a custom mapping stage. For the purposes of this blog, we’ll create a custom JSON ObjectMapper in a JavaScript stage. We’ll be using the Local File System datasource, however it would be possible to use it in a Web Crawler datasource with no changes.

Step One: Create the Datasource

In the Fusion UI, open or create a collection, and add a Local Filesystem datasource. Scroll down to ‘Start Links’ and add a full path to your target start directory, E.g. /path/to/my/start/dir. This should be a valid path to one or more JSON files. Save your datasource.

Note: This pipeline assumes that all files crawled will be JSON files.

Step Two: Create the pipeline

Because we’re doing something non-standard with the files being crawled, we’ll create a proprietary pipeline to handle the processing.

Open your “Index Pipelines” view for the new datasource, and click .

Give your new pipeline a name, E.g. custom_object_mapping_pipeline.

You’ll use three (3) stages for this pipeline: 1) Apache Tika Parser; 2) JavaScript; 3) Solr Indexer, in this order.

First, add the Apache Tika Parser stage. In the settings for this stage, make sure the “Return original XML and HTML instead of Tika XML output” is checked. Save this stage.

Next, we’ll add the JavaScript stage. This is where all the processing (mapping) will be handled. Add a JavaScript stage from your list of stage selections, give it a name, and add the following code to the “Script Body” field:

function (doc) {
    if (doc !== null && doc.getId() !== null) {

        // class import declaration
        var ObjectMapper = com.fasterxml.jackson.databind.ObjectMapper;
        var ArrayList = java.util.ArrayList;
        var Map = java.util.Map;
        var StringUtils = org.apache.commons.lang3.StringUtils;
        var String = java.lang.String;
        var e = java.lang.Exception;


        try {

            // local variable declaration
            var mapper = new ObjectMapper();
            var content = doc.getFirstFieldValue("body");
            if (content !== null) {
                var mapData = mapper.readValue(content, Map.class);
                if (mapData != null) {
                    logger.info("Read data OK");
                    var result = mapData.get("result");
                    var obj = java.lang.Object;
                    var key = java.lang.String;
                    var list = java.util.ArrayList;
                    for each(var key in result.keySet()) {
                        obj = result.get(key);

                        if (obj instanceof String) {
                            logger.info("Key: " + key + " object: " + obj.getClass().getSimpleName() + " value: " + obj);
                            doc.addField(key, obj);
                        } else if (obj instanceof ArrayList) {
                            list = obj;
                            logger.info("Key: " + key + " object: " + obj.getClass().getSimpleName() + " value: " + StringUtils.join(list, ","));
                            doc.addField(key, StringUtils.join(list, ","));
                        }
                    }

                }

            } else {
                logger.info("Content was NULL! ");
            }

        } catch (e) {
            logger.error(e);
        }
    } else {
        logger.warn("PipelineDocument was NULL");
    }

    return doc;
}

Breaking it down:

So what’s going on in the above code? In the scope of this blog, I’ll discuss the significant processing pieces. If you are interested in further reading on how to use the Nashorn JavaScript engine, you can find further reading in Oracle’s documentation here. At the heart of it, we’re taking the content from the ‘body’ field (created during the Tika parsing stage) and spinning that up into a Map object using the Jackson ObjectMapper. This instantiation is accomplished with these lines of code:

   var mapper = new ObjectMapper();
            var content = doc.getFirstFieldValue("body");
            if (content !== null) {
                var mapData = mapper.readValue(content, Map.class);
                if (mapData != null) {
                ...

From here you would have to think about the structure of the json you’d be parsing. Note: In the context of this blog, I’m pulling the ‘result’ top level object from the Map and iterating over the values therein. However, depending on the structure of the JSON you’re parsing, your algorithm may vary.

While iterating over the values, you’re going to want to check to see what type of object is returned. When using generic object classes, Jackson will return the following:

JSON Type	Java Type
object	LinkedHashMap<String,Object>
array	ArrayList<Object>
string	String
number (no fraction)	Integer, Long or BigInteger (smallest applicable)
number (fraction)	Double (configurable to use BigDecimal)
true\|false	Boolean
null	null

In this above example, I’m checking only for ‘String’ and ‘ArrayList’. Again, depending on what type of JSON you’re parsing, your algorithm may vary.

Once we’ve determined the object’s class type, we transform it into a string (well, unless it’s already a string) and then add it to our PipelineDocument like so:

 if (obj instanceof String) {
     doc.addField(key, obj);
     ...

Finally, add a ‘Solr Indexer’ stage, so that your PipelineDocument will be saved. Note: You should make sure whatever fields you’re indexing in the JSON exist in your collection, and are initialized to the expected format (E.g. ‘strings’).

Happy Indexing!

About Kevin Cowan

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.

Lucidworks-Plattform – Übersicht

Lucidworks-Plattform – Preisgestaltung

KI-Zentrum

FUNKTIONEN VON LUCIDWORKS (ALLES INKLUSIVE)

Produktentdeckung

Searchandising

Websitesuche

Suche am Arbeitsplatz

Daten aufnehmen und Signale erfassen

Sucherlebnis der Mitarbeitenden

Kundenservice und Lösung von Fällen

KI und Large Language Models

LÖSUNGEN

Commerce

Kundenservice

Wissensmanagement

BRANCHEN

B2B-Commerce und -Vertrieb

B2B-Fertigung

Einzelhandel

Regierungsbehörden und öffentlicher Sektor

Gesundheitswesen

Finanzdienstleistungen

B2B Core Package

ENTDECKEN SIE UNSERE INHALTE

E-Books und Berichte

Blog

Videos

Presse

RESSOURCEN

Über Lucidworks

Dokumentation

Karriere

LucidAcademy

Kontakt

Technischer Support

Making Fusion Work for You: A Custom JSON ObjectMapper in a JavaScript Stage

About Kevin Cowan

LEARN MORE

Lucidworks-Plattform – Übersicht

Lucidworks-Plattform – Preisgestaltung

KI-Zentrum

FUNKTIONEN VON LUCIDWORKS (ALLES INKLUSIVE)

Produktentdeckung

Searchandising

Websitesuche

Suche am Arbeitsplatz

Daten aufnehmen und Signale erfassen

Sucherlebnis der Mitarbeitenden

Kundenservice und Lösung von Fällen

KI und Large Language Models

LÖSUNGEN

Commerce

Kundenservice

Wissensmanagement

BRANCHEN

B2B-Commerce und -Vertrieb

B2B-Fertigung

Einzelhandel

Regierungsbehörden und öffentlicher Sektor

Gesundheitswesen

Finanzdienstleistungen

B2B Core Package

ENTDECKEN SIE UNSERE INHALTE

E-Books und Berichte

Blog

Videos

Presse

RESSOURCEN

Über Lucidworks

Dokumentation

Karriere

LucidAcademy

Kontakt

Technischer Support

About Kevin Cowan

Related Articles

LEARN MORE