A while back, I joined the #solr IRC channel in the middle of conversation about Solr’s queryResultCache & filterCache. The first message I saw was…

< victori:#solr> anyway, are filter queries applied independently on the full dataset or one after another on a shrinking resultset?

As with many things in life, the answer is „It Depends“

In particular, the answer to this question largely depends on:

… but further nuances come into play depending on:

  • The effective cost param specified on each fq (defaults to ‚0‘ for most queries)
  • The type of the underlying Query object created for each fq: Do any implement the PostFilter API?

As I explained some of these nuances on IRC (and what they change about the behavior) I realized 2 things:

  • This would make a really great blog post!
  • I wish there was a way to demonstrate how all this happens, rather then just describe it.

That led me to wonder if it would be possible to create a „Tracing Wrapper Query Parser“ people could use to get get log messages showing when exactly a given Query (or more specifically the „Scorer“ for a that Query) was asked to evaluated each document. With something like this, people could experiment (on small datasets) with different q & fq params and different cache and cost local params and see how the execution changes. I made a brief attempt at building this kind of QParser wrapper, and quickly got bogged down in lots of headaches with how complex the general purpose Query, Weight, and Scorer APIs can be at the lower level.

On the other hand: the ValueSource (aka Function) API is much simpler, and easily facilitates composing functions around other functions. Solr also already makes it easy to use any ValueSource as a Query via the {!frange} QParser — which just so happens to also support the PostFilter API!

A few hours later, the „TraceValueSource“ and trace() function syntax were born, and now I can use it to walk you through the various nuances of how Solr executes different Queries & Filters.

IMPORTANT NOTE:

In this article, we’re going to assume that the underlying logic Lucene uses to execute a simple Query is essentially: Loop over all docIds in the index (starting at 0) testing each one against the Query, if a document matches record it’s score and continue with the next docId in the index.

Likewise we’re going to assume that when Lucene is computing the Conjunction (X ^ Y) of two Queries, the logic is essentially:

  • Loop over all docIds in the index (starting at 0) testing each one against X until we find a matching document
  • if that document also matches Y then record it’s score and continue with the next docId
  • If the document does not match Y, swap X & Y, and start the process over with the next docId

These are both extreme over simplifications of how most queries are actually executed — many Term & Points based queries are much more optimized to „skip ahead“ in the list of documents based on the term/points metadata — but it is a „close enough“ approximation to what happens when all Queries are ValueSource based for our purposes today.

{!frange} Queries and the trace() Function

Let’s start with a really simple warm up to introduce you to the {!frange} QParser and the trace() function I added, beginning with some trivial sample data…

$ bin/solr -e schemaless -noprompt
...
curl -H 'Content-Type: application/json' 'http://localhost:8983/solr/gettingstarted/update?commit=true' --data-binary '
[{"id": "A", "foo_i":  42, "bar_i":   99},
 {"id": "B", "foo_i": -42, "bar_i":   75},
 {"id": "C", "foo_i":  -7, "bar_i": 1000},
 {"id": "D", "foo_i":   7, "bar_i":   50}]'
...
tail -f example/schemaless/logs/solr.log
...

For most of this blog I’ll be executing queries against these 4 documents, while showing you:

  • The full request URL
  • Key url-decoded request params in the request for easier reading
  • All log messages written to solr.log as a result of the request

The {!frange} parser allows user to specify an arbitrary function (aka: ValueSource) that will be wrapped up into a query that will match documents if and only if the results of that function fall in a specified range. For example: With the 4 sample documents we’ve indexed above, the query below does not match document ‚A‘ or ‚C‘ because the sum of the foo_i + bar_i fields (42 + 100 = 142 and -7 + 1000 = 993 respectively) does not fall in between the lower & upper range limits of the query (0 <= sum(foo_i,bar_i) <= 100) …

http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}sum%28foo_i,bar_i%29
// q = {!frange l=0 u=100}sum(foo_i,bar_i)

{
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"B"},
      {
        "id":"D"}]
  }}
  
INFO  - 2017-11-14 20:27:06.897; [   x:gettingstarted] org.apache.solr.core.SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}sum(foo_i,bar_i)&omitHeader=true&fl=id} hits=2 status=0 QTime=29

Under the covers, the Scorer for the FunctionRangeQuery produced by this parser loops over each document in the index and asks the ValueSource if it „exists“ for that document (ie: do the underlying fields exist) and if so then it asks for the computed value for that document.

Generally speaking, the trace() function we’re going to use, implements the ValueSource API in such a way that any time it’s asked for the „value“ of a document, it delegates to another ValueSource, and logs a message about the input (document id) and the result — along with a configurable label.

If we change the function used in our previous query to be trace(simple_sum,sum(foo_i,bar_i)) and re-execute it, we can see the individual methods called on the „sum“ ValueSource in this process (along with the internal id + uniqueKey of the document, and the „simple_sum“ label we’ve chosen) and the result of the wrapped function …

http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29
// q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))

TraceValueSource$TracerValues; simple_sum: exists(#0: "A") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#0: "A") -> 141.0
TraceValueSource$TracerValues; simple_sum: exists(#1: "B") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#1: "B") -> 33.0
TraceValueSource$TracerValues; simple_sum: exists(#2: "C") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#2: "C") -> 993.0
TraceValueSource$TracerValues; simple_sum: exists(#3: "D") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#3: "D") -> 57.0
SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id} hits=2 status=0 QTime=6

Because we’re using the _default Solr configs, this query has now been cached in the queryResultCache. If we re-execute it no new „tracing“ information will be logged, because Solr doesn’t need to evaluate the ValueSource against each of the documents in the index in order to respond to the request…

http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29
// q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))

SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id} hits=2 status=0 QTime=0

Normal fq Processing

Now let’s use multiple {!frange} & trace() combinations to look at what happens when we have some filter queries in our request…

http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29&fq={!frange%20l=0}trace%28pos_foo,foo_i%29&fq={!frange%20u=90}trace%28low_bar,bar_i%29
//  q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))
// fq = {!frange l=0}trace(pos_foo,foo_i)
// fq = {!frange u=90}trace(low_bar,bar_i)

TraceValueSource$TracerValues; pos_foo: exists(#0: "A") -> true
TraceValueSource$TracerValues; pos_foo: floatVal(#0: "A") -> 42.0
TraceValueSource$TracerValues; pos_foo: exists(#1: "B") -> true
TraceValueSource$TracerValues; pos_foo: floatVal(#1: "B") -> -42.0
TraceValueSource$TracerValues; pos_foo: exists(#2: "C") -> true
TraceValueSource$TracerValues; pos_foo: floatVal(#2: "C") -> -7.0
TraceValueSource$TracerValues; pos_foo: exists(#3: "D") -> true
TraceValueSource$TracerValues; pos_foo: floatVal(#3: "D") -> 7.0
TraceValueSource$TracerValues; low_bar: exists(#0: "A") -> true
TraceValueSource$TracerValues; low_bar: floatVal(#0: "A") -> 99.0
TraceValueSource$TracerValues; low_bar: exists(#1: "B") -> true
TraceValueSource$TracerValues; low_bar: floatVal(#1: "B") -> 75.0
TraceValueSource$TracerValues; low_bar: exists(#2: "C") -> true
TraceValueSource$TracerValues; low_bar: floatVal(#2: "C") -> 1000.0
TraceValueSource$TracerValues; low_bar: exists(#3: "D") -> true
TraceValueSource$TracerValues; low_bar: floatVal(#3: "D") -> 50.0
TraceValueSource$TracerValues; simple_sum: exists(#3: "D") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#3: "D") -> 57.0
SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id&fq={!frange+l%3D0}trace(pos_foo,foo_i)&fq={!frange+u%3D90}trace(low_bar,bar_i)} hits=1 status=0 QTime=23

There’s a lot of information here to consider, so let’s break it down and discuss in the order of the log messages…

  • In order to cache the individual fq queries for maximum possible re-use, Solr executes each fq query independently against the entire index:
    • First the „pos_foo“ function is run against all 4 documents to identify if 0 <= foo_i
      • this resulting DocSet is put into the filterCache for this fq
    • then the „low_bar“ function is run against all 4 documents to see if bar_i <= 90
      • this resulting DocSet is put into the filterCache for this fq
  • Now the main query (simple_sum) is now ready to be run:
    • Instead of executing the main query against all documents in the index, it only needs to be run against the intersection of the DocSets from each of the individual (cached) filters
    • Since document ‚A‘ did not match the „low_bar“ fq, the „simple_sum“ function is never asked to evaluated it as a possible match for the overall request
    • Likewise: since ‚B‘ did not match the „pos_foo“ fq, it is also never considered.
    • Likewise: since ‚C‘ did not match the „low_bar“ fq, it is also never considered.
    • Only document „D“ matched both fq filters, so it is checked against the main query — and it is a match, so we have hits=1

In future requests, even if the main q param changes and may potentially match a different set of values/documents, the cached filter queries can still be re-used to limit the set of documents the main query has to check — as we can see in this next request using the same fq params…

http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20u=999}trace%28max_foo,foo_i%29&fq={!frange%20l=0}trace%28pos_foo,foo_i%29&fq={!frange%20u=90}trace%28low_bar,bar_i%29
//  q = {!frange u=999}trace(max_foo,foo_i)
// fq = {!frange l=0}trace(pos_foo,foo_i)
// fq = {!frange u=90}trace(low_bar,bar_i)

TraceValueSource$TracerValues; max_foo: exists(#3: "D") -> true
TraceValueSource$TracerValues; max_foo: floatVal(#3: "D") -> 7.0
SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+u%3D999}trace(max_foo,foo_i)&omitHeader=true&fl=id&fq={!frange+l%3D0}trace(pos_foo,foo_i)&fq={!frange+u%3D90}trace(low_bar,bar_i)} hits=1 status=0 QTime=1

Non-cached Filters

Now let’s consider what happens if we add 2 optional local params to our filter queries:

  • cache=false - Tells Solr that we don't need/want this filter to be cached independently for re-use.
    • This will allow Solr to evaluate these filters at the same time it’s processing the main query


  • cost=X – Specifies an integer „hint“ to Solr regarding how expensive it is to execute this filter.
    • Solr provides special treatment to some types of filters when 100 <= cost (more on this later)
    • By default Solr assumes most filters have a default of cost=0 (but beginning with Solr 7.2, {!frange} queries default to cost=100)
    • For this examples, we’ll explicitly specify a cost on each fq such that: 0 < cost < 100.
http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29&fq={!frange%20cache=false%20cost=50%20l=0}trace%28pos_foo_nocache_50,foo_i%29&fq={!frange%20cache=false%20cost=25%20u=100}trace%28low_bar_nocache_25,bar_i%29
//  q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))
// fq = {!frange cache=false cost=50 l=0}trace(pos_foo_nocache_50,foo_i)
// fq = {!frange cache=false cost=25 u=100}trace(low_bar_nocache_25,bar_i)

TraceValueSource$TracerValues; low_bar_nocache_25: exists(#0: "A") -> true
TraceValueSource$TracerValues; low_bar_nocache_25: floatVal(#0: "A") -> 99.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#0: "A") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#0: "A") -> 42.0
TraceValueSource$TracerValues; simple_sum: exists(#0: "A") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#0: "A") -> 141.0
TraceValueSource$TracerValues; low_bar_nocache_25: exists(#1: "B") -> true
TraceValueSource$TracerValues; low_bar_nocache_25: floatVal(#1: "B") -> 75.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#1: "B") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#1: "B") -> -42.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#2: "C") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#2: "C") -> -7.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#3: "D") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#3: "D") -> 7.0
TraceValueSource$TracerValues; low_bar_nocache_25: exists(#3: "D") -> true
TraceValueSource$TracerValues; low_bar_nocache_25: floatVal(#3: "D") -> 50.0
TraceValueSource$TracerValues; simple_sum: exists(#3: "D") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#3: "D") -> 57.0
SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id&fq={!frange+cache%3Dfalse+cost%3D50+l%3D0}trace(pos_foo_nocache_50,foo_i)&fq={!frange+cache%3Dfalse+cost%3D25+u%3D100}trace(low_bar_nocache_25,bar_i)} hits=1 status=0 QTime=8

Let’s again step through this in sequence and talk about what’s happening at each point:

  • Because the filters are not cached, Solr can combine them with the main q query and execute all three in one pass over the index
  • The filters are sorted according to their cost, and the lowest cost filter (low_bar_nocache_25) is asked to find the „first“ document it matches:
    • Document „A“ is a match for low_bar_nocache_25 (bar_i <= 100) so then the next filter is consulted…
    • Document „A“ is also a match for pos_foo_nocache_50 (0 <= foo_i) so all filters match — the main query can be consulted…
    • Document „A“ is not a match for the main query (simple_sum)
  • The filters are then asked to find their „next“ match after „A“, beginning with the lowest cost filter: low_bar_nocache_25
    • Document „B“ is a match for ‚low_bar_nocache_25‘, so the next filter is consulted…
    • Document „B“ is not a match for the ‚pos_foo_nocache_50‘ filter, so that filter keeps checking until it finds it’s „next“ match (after „B“)
    • Document „C“ is not a match for the ‚pos_foo_nocache_50‘ filter, so that filter keeps checking until it finds it’s „next“ match (after „C“)
    • Document „D“ is the „next“ match for the ‚pos_foo_nocache_50‘ filter, so the remaining filter(s) are consulted regarding that document…
    • Document „D“ is also a match for the ‚low_bar_nocache_25‘ filter, so all filters match — the main query can be consulted again.
    • Document „D“ is a match for the main query (simple_sum), and we have our first (and only) hit for the request

There are two very important things to note here that may not be immediately obvious:

  1. Just because the individual fq params indicate cache=false does not mean that nothing about their results will be cached. The results of the main q in conjunction with the (non-cached) filters can still wind up in the queryResultCache, as you can see if the exact same query is re-executed…
    http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29&fq={!frange%20cache=false%20cost=50%20l=0}trace%28pos_foo_nocache_50,foo_i%29&fq={!frange%20cache=false%20cost=25%20u=100}trace%28low_bar_nocache_25,bar_i%29
    //  q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))
    // fq = {!frange cache=false cost=50 l=0}trace(pos_foo_nocache_50,foo_i)
    // fq = {!frange cache=false cost=25 u=100}trace(low_bar_nocache_25,bar_i)
    
    SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id&fq={!frange+cache%3Dfalse+cost%3D50+l%3D0}trace(pos_foo_nocache_50,foo_i)&fq={!frange+cache%3Dfalse+cost%3D25+u%3D100}trace(low_bar_nocache_25,bar_i)} hits=1 status=0 QTime=1
    

    …we don’t get any trace() messages, because the entire „q + fqs + sort + pagination“ combination was in the queryResultCache.

    (NOTE: Just as using cache=false in the local params of the fq params prevent them from being put in the filterCache, specifying cache=false on the q param can also prevent an entry for this query being added to the queryResultCache if desired)

  2. The relative cost value of each filter does not dictate the order that they are evaluated against every document.
    • In the example above, the higher cost=50 specified on on the ‚pos_foo_nocache_50‘ filter did not ensure it would be executed against fewer documents then the lower cost ‚low_bar_nocache_25‘ filter
      • Document „C“ was checked against (and ruled out by) the (higher cost) ‚pos_foo_nocache_50‘ filter with out ever checking that document against the lower cost ‚low_bar_nocache_25‘
    • The cost only indicates in what order each filter should be consulted to find it’s „next“ matching document after each previously found match against the entire request
      • Relative cost values ensure that a higher cost filter will not be asked to find check the „next“ match against any document that a lower cost filter has already definitively ruled out as a non-match.

    Compare the results above with the following example, where the same functions use new ‚cost‘ values:

    http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29&fq={!frange%20cache=false%20cost=10%20l=0}trace%28pos_foo_nocache_10,foo_i%29&fq={!frange%20cache=false%20cost=80%20u=100}trace%28low_bar_nocache_80,bar_i%29
    //  q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))
    // fq = {!frange cache=false cost=10 l=0}trace(pos_foo_nocache_10,foo_i)
    // fq = {!frange cache=false cost=80 u=100}trace(low_bar_nocache_80,bar_i)
    
    TraceValueSource$TracerValues; pos_foo_nocache_10: exists(#0: "A") -> true
    TraceValueSource$TracerValues; pos_foo_nocache_10: floatVal(#0: "A") -> 42.0
    TraceValueSource$TracerValues; low_bar_nocache_80: exists(#0: "A") -> true
    TraceValueSource$TracerValues; low_bar_nocache_80: floatVal(#0: "A") -> 99.0
    TraceValueSource$TracerValues; simple_sum: exists(#0: "A") -> true
    TraceValueSource$TracerValues; simple_sum: floatVal(#0: "A") -> 141.0
    TraceValueSource$TracerValues; pos_foo_nocache_10: exists(#1: "B") -> true
    TraceValueSource$TracerValues; pos_foo_nocache_10: floatVal(#1: "B") -> -42.0
    TraceValueSource$TracerValues; pos_foo_nocache_10: exists(#2: "C") -> true
    TraceValueSource$TracerValues; pos_foo_nocache_10: floatVal(#2: "C") -> -7.0
    TraceValueSource$TracerValues; pos_foo_nocache_10: exists(#3: "D") -> true
    TraceValueSource$TracerValues; pos_foo_nocache_10: floatVal(#3: "D") -> 7.0
    TraceValueSource$TracerValues; low_bar_nocache_80: exists(#3: "D") -> true
    TraceValueSource$TracerValues; low_bar_nocache_80: floatVal(#3: "D") -> 50.0
    TraceValueSource$TracerValues; simple_sum: exists(#3: "D") -> true
    TraceValueSource$TracerValues; simple_sum: floatVal(#3: "D") -> 57.0
    SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id&fq={!frange+cache%3Dfalse+cost%3D10+l%3D0}trace(pos_foo_nocache_10,foo_i)&fq={!frange+cache%3Dfalse+cost%3D80+u%3D100}trace(low_bar_nocache_80,bar_i)} hits=1 status=0 QTime=3
    

    The overall flow is fairly similar to the last example:

    • Because the filters are not cached, Solr can combine them with the main query and execute all three in one pass over the index
    • The filters are sorted according to their cost, and the lowest cost filter (pos_foo_nocache_10) is asked to find the „first“ document it matches:
      • Document „A“ is a match for pos_foo_nocache_10 (0 <= foo) — so the next filter is consulted…
      • Document „A“ is a match for low_bar_nocache_80 (bar <= 100) — so all filters match, and so the main query can be consulted…
      • Document „A“ is not a match for the main query (simple_sum)
    • The filters are then asked to find their „next“ match after „A“, beginning with the lowest cost filter: (pos_foo_nocache_10)
      • Document „B“ is not a match for the ‚pos_foo_nocache_10‘ filter, so that filter keeps checking until it finds it’s „next“ match (after „B“)
      • Document „C“ is not a match for the ‚pos_foo_nocache_10‘ filter, so that filter keeps checking until it finds it’s „next“ match (after „C“)
      • Document „D“ is the „next“ match for the ‚pos_foo_nocache_10‘ filter, so the remaining filter(s) are consulted regarding that document…
      • Document „D“ is also a match for the ‚low_bar_nocache_80‘ filter, so all filters match — the main query can be consulted again.
      • Document „D“ is a match for the main query, and we have our first (and only) hit for the request

The key thing to note in these examples, is that even though we’ve given Solr a „hint“ at the relative cost of these filters, the underlying Scoring APIs in Lucene depend on being able to ask each Query to find the „next match after doc#X“. Once a „low cost“ filter has been asked to do this, the document it identifies will be used as the input when asking a „higher cost“ filter to find it’s „next match“, and if the higher cost filter matches very few documents, it may have to „scan over“ more total documents in the segment then the lower cost filter.

Post Filtering

There are a small handful of Queries available in Solr (notably {!frange} and {!collapse}) which — in addition to supporting the normal Lucene iterative scoring APIs — also implement a special „PostFilter“ API.

When a Solr request includes a filter that is cache=false and has a cost >= 100 Solr will check if the underlying Query implementation supports the PostFilter API; If it does, Solr will automatically use this API, ensuring that these post filters will only be consulted about a potential matching document after:

  • It has already been confirmed to be a match for all regular (non-post) fq filters
  • It has already been confirmed to be a match for the main q Query
  • It has already been confirmed to be a match for any lower cost post-filters

(This overall user experience (and special treatment of cost >= 100, rather then any sort of special postFilter=true syntax) is focused on letting users indicate how „expensive“ they expect the various filters to be, while letting Solr worry about the best way to handle those various expensive filters depending on how they are implemented internally with out the user being required to know in advance „Does this query support post filtering?“)

For Advanced Solr users who want to write custom filtering plugins (particularly security related filtering that may need to consult external data sources or enforce complex rules) the PostFilter API can be a great way to ensure that expensive operations are only executed if absolutely necessary.

Let’s reconsider our earlier example of non-cached filter queries, but this time we’ll use cost=200 on the bar < 100 filter condition so it will be used as a post filter…

http://localhost:8983/solr/gettingstarted/select?omitHeader=true&fl=id&q={!frange%20l=0%20u=100}trace%28simple_sum,sum%28foo_i,bar_i%29%29&fq={!frange%20cache=false%20cost=50%20l=0}trace%28pos_foo_nocache_50,foo_i%29&fq={!frange%20cache=false%20cost=200%20u=100}trace%28low_bar_postfq_200,bar_i%29
//  q = {!frange l=0 u=100}trace(simple_sum,sum(foo_i,bar_i))
// fq = {!frange cache=false cost=50 l=0}trace(pos_foo_nocache_50,foo_i)
// fq = {!frange cache=false cost=200 u=100}trace(low_bar_postfq_200,bar_i)

TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#0: "A") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#0: "A") -> 42.0
TraceValueSource$TracerValues; simple_sum: exists(#0: "A") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#0: "A") -> 141.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#1: "B") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#1: "B") -> -42.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#2: "C") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#2: "C") -> -7.0
TraceValueSource$TracerValues; pos_foo_nocache_50: exists(#3: "D") -> true
TraceValueSource$TracerValues; pos_foo_nocache_50: floatVal(#3: "D") -> 7.0
TraceValueSource$TracerValues; simple_sum: exists(#3: "D") -> true
TraceValueSource$TracerValues; simple_sum: floatVal(#3: "D") -> 57.0
TraceValueSource$TracerValues; low_bar_postfq_200: exists(#3: "D") -> true
TraceValueSource$TracerValues; low_bar_postfq_200: floatVal(#3: "D") -> 50.0
SolrCore; [gettingstarted]  webapp=/solr path=/select params={q={!frange+l%3D0+u%3D100}trace(simple_sum,sum(foo_i,bar_i))&omitHeader=true&fl=id&fq={!frange+cache%3Dfalse+cost%3D50+l%3D0}trace(pos_foo_nocache_50,foo_i)&fq={!frange+cache%3Dfalse+cost%3D200+u%3D100}trace(low_bar_postfq_200,bar_i)} hits=1 status=0 QTime=4

Here we see a much different execution flow from the previous examples:

  • The lone non-cached (non-post) filter (pos_foo_nocache_50) is initially consulted to find the „first“ document it matches
    • Document „A“ is a match for pos_foo_nocache_50 (0 <= foo) — so all „regular“ filters match, and the main query can be consulted…
    • Document „A“ is not a match for the main query (simple_sum) so we stop considering „A“
    • The post-filter (low_bar_postfq_200) is never consulted regarding „A“
  • The lone non-post filter is again asked to find it’s „next“ match after „A“
    • Document „B“ is not a match for the ‚pos_foo_nocache_50‘ filter, so that filter keeps checking until it finds it’s „next“ match (after „B“)
    • Document „C“ is not a match for the ‚pos_foo_nocache_50‘ filter, so that filter keeps checking until it finds it’s „next“ match (after „C“)
    • Document „D“ is the „next“ match for the ‚pos_foo_nocache_50‘ filter — since there are no other „regular“ filters, the main query is consulted again
    • Document „D“ is also a match for the main query
    • After all other conditions have been satisfied Document „D“ is then checked against the post filter (low_bar_postfq_200) — since it matches we have our first (and only) hit for the request

In these examples, the functions we’ve used in our filters have been relatively simple, but if you wanted to filter on multiple complex math functions over many fields, you can see how specifying a „cost“ relative to the complexity of the function could be advantageous to ensure that the „simpler“ functions are checked first.

In Conclusion…

Hopefully these examples I’ve walked through are helpful for folks trying to wrap their heads around how/why filter queries behave in various sitautions, and specifically how {!frange} queries work so you can consider some of the trade offs of tweaking the cache and cost params of your various filters.

Even for me, with ~12 years of Solr experience, running through these examples made me realize I had a missconception about how/when FunctionRangeQuery could be optimized (ultimately leading to SOLR-11641 which should make {!frange cache=false ...} much faster by default in future Solr versions)

About Hoss

Read more from this author

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.