Space Vatican

Ramblings of a curious coder

Elasticsearch Native Scripts for Dummies

One of the cool things about elasticsearch is the ability to provide scripts that calculate custom ordering or that filter based on application specific logic. Out of the box elasticsearch supports mvel and there are also plugins that support python and javascript. I imagine that it would be pretty simple to provide a jruby one too.

You can also use so called native scripts, written in java. These are faster than the other alternatives and may also be handier if you need to integrate with some existing java code to calculate your scores. There is some info out there on how to build these but they presuppose a certain familiarity with java and its environment. If you’re anything like me then you can bumble through java syntax readily enough but classpaths, jars etc. are a bit of a mystery. So here’s how I got a native script running with instructions that (hopefully) presuppose almost no knowledge of java. I’m no java wizard - I may well be doing something dumb - but this is working well enough for us in production.

This example does the same as one of the examples from the documentation - the score is set to a function of one of the doc’s fields and two parameters.

The only thing that elasticsearch requires is that you provide a class that impements the NativeScriptFactory interface, and all that has to do is implement a newScript method, that returns an instance of something implementing the ExecutableScript interface, which in turn implements methods like runAsDouble, runAsLong (thank you java for your conciseness and absence of ceremony). This being java you’re supposed to put this in a package named in a reverse dns style. My factory class looks like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
package org.spacevatican.elasticsearchexample;

import org.elasticsearch.common.Nullable;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;

import org.spacevatican.elasticsearchexample.CustomScript;

import java.util.Map;

public class CustomScriptFactory implements NativeScriptFactory {

  @Override public ExecutableScript newScript (@Nullable Map<String,Object> params){
    return new CustomScript(params);
  }
}

This is basically 100% boilerplate. The newScript method just creates an instance of my script class passing through the params object. That params object is just the params option you can specify when you invoke a custom script.

The script class is where the calculations actually happen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package org.spacevatican.elasticsearchexample;

import org.elasticsearch.common.Nullable;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;
import org.elasticsearch.script.AbstractDoubleSearchScript;

import java.util.Map;
import java.lang.Math;

public class CustomScript extends AbstractDoubleSearchScript {

    double base;
    double exp;

    public CustomScript(@Nullable Map<String,Object> params){
      base = ((Double)params.get("base")).doubleValue();
      exp = ((Double)params.get("exp")).doubleValue();
    }

    @Override
    public double runAsDouble() {
      double a = doc().numeric("a").getDoubleValue();

      return a / Math.pow(base, exp);
    }
}

I’m writing a script that returns a double score, so I’ve derived from AbstractDoubleSearchScript and implemented runAsDouble.

In the constructor I’m just stashing the parameters I’m interested in. The actual computation happens in runAsDouble. This part is pretty similar to what you’d do in an mvel script, just with a little more ceremony.

Building it

Here’s where I got confused initially - I know little of java packaging conventions and most of the material I found covering native scripts glossed over this bit. I ended up with the following directory structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
elasticsearch-root
  lib/
    elasticsearch-0.19.3.jar
    ...
  config/
    elasticsearch.yml
    ...
  elasticsearch-example
    org/
      spacevatican/
        elasticsearchexample/
          CustomScript.java
          CustomScriptFactory.java
    Rakefile
  ...

elasticsearch-root is just wherever you’ve installed elastic search. I haven’t shown all the files/directories that make up elasticsearch itself. The org/spacevatican/elasticsearch set of directories mirrors the org.spacevatican.elasticsearch package naming.

The Rakefile defines a build task that builds a jar from these files. If you’ve put your elasticsearch install somewhere different relative to the Rakefile, you’ll need to change the classpath.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
require 'fileutils'
task :compile do
  source = FileList['**/*.java']
  sh "javac #{source.collect {|s| "'#{s}'" }.join(' ')}  -classpath ../lib/elasticsearch-*.jar"
end

task :clean do
  FileUtils::Verbose.rm Dir.glob("**/*.class")
end

task :package do
  objects = FileList['**/*.class']
  sh "jar cf MyNativeScript.jar #{objects.collect {|s| "'#{s}'" }.join(' ')}"
end

task :build => [:clean, :compile, :package]

All this Rakefile does is compile all the *.java files we can find, adding the elasticsearch jar to the classpath (since we’re extending classes provided by elasticsearch). If you wanted to use other java libs, you’d need to add them to this classpath. The package task then just pops them all in a jar file (the jar command behaves not that differently to tar in basic use). You then need to place this jar file somewhere where elasticsearch will find and load it. The easiest thing is probably to copy it into elasticsearch’s lib folder.

Installing it

Now that we’ve got our native script built, the last thing we need to do is tell elasticsearch how to use it. Add something like this to your elastic search configuration (by default this is the config/elasticsearch.yml file in your elasticsearch directory)

1
2
script.native:
  mynativescript.type: org.spacevatican.elasticsearchexample.CustomScriptFactory

This tells elasticsearch that you’re adding a native script called mynativescript and that when you ask it to use mynativescript it should use the org.spacevatican.elasticsearchexample.CustomScriptFactory class to create new script instances. The name you use here is completely arbitrary - it doesn’t need to match any of your class names, file names etc.

Restart elasticsearch and you should be able to invoke the script. If it can’t find the class referenced here elasticsearch should fail to startup.

Using it

First, lets stick some entries in the index

1
2
3
4
5
6
7
8
9
curl -XPUT 'http://localhost:9200/example/doc/1' -d '{
    "name" : "doc 1",
    "a" : 4
}'

curl -XPUT 'http://localhost:9200/example/doc/2' -d '{
    "name" : "doc 2",
    "a" : 16
}'

To use a script you specify the name used in the yaml file (not the name of your script class, factory class, jar file or anything like that). You also need to specify "lang":"native" or else elasticsearch will try to interpret your script name as an mvel expression.

1
2
3
4
5
6
7
8
9
10
11
12
13
curl -XGET 'http://localhost:9200/example/doc/_search' -d '{
  "query" :{
     "custom_score": {
       "query" : { "match_all": {}},
       "script" : "mynativescript",
       "params" :{
          "base": 2.0,
          "exp": 3.0
       },
       "lang": "native"
     }
  }
}'

And you should see that the scores for the two hits are 0.5 and 2.0 as you would expect.

The code for this mini native script is on github. If you try and use it, it assumes that you’ve checked out the repo into the folder that contains your elasticsearch install.