elasticsearch ingest pipeline — tips and tricks #2

devops terminal
3 min readApr 4, 2021
Photo by Christophe Dion on Unsplash

On the previous blog, we know that there is a special processor named as “pipeline” which acts like a function for other ingest-pipelines to invoke. Today we would look into some pipeline-processor techniques based on condition switching logics :)

setting a value based on a source field

Take an example, we have a field named “categoryValue”. If this value equals to “plant” then the field “categoryCode”’s value would be set to “A”. The following is the logic matrix:

  • categoryValue = “plant”, categoryCode = “A”
  • categoryValue = “animal”, categoryCode = “B”

The corresponding pipeline can be written as follows:

The switching logic is based on the “set” processor’s “if” clause, the variable “ctx” is the document context which provides access to the fields in this document instance.

Quite simple, isn’t it?

providing parameters to the pipeline

if you read the official documentation of the pipeline processor, you won’t be able to find a sentence mentioning how to provide a parameter for the pipeline. But in actual, there is a workaround (though a bit ugly).

we first create a pipeline named “pipMultiply2” — simply runs a multiplication of the value provided by field “paramValue”. Do note that, a field existence check is done through:

if (ctx.paramValue != null) …

the multiplication result is set to the field “finalValue”. We also remove the parameter field afterwards by:

ctx.remove(“paramValue”)

Now it is time to simulate the pipeline + providing a parameter for testing:

The resulted value would be exactly “200”. Like described, this approach works but a bit ugly, since we would need to set a field (e.g. paramValue”) for the target document before running the multiply-by-2 pipeline; also the paramValue would need to be removed afterwards (if necessary).

providing a parameter to the pipeline — 2

We already know how to use the “ugly” approach to pass a parameter to a pipeline, however if you are a fan of this approach, there is another workaround as follows:

We created a stored script within the elasticsearch cluster. The logic of this script is simple — multiply the parameter’s value by 2. Do note that a check existence logic is applied through:

if (params[‘paramValue’] != null) …

Now test our script within the pipeline:

The final result would involve a field “finalValue” with a value of “24”. This approach technically is employing the pipeline-processor, however it works like a charm and still able to keep the re-usability (though we are abstracting the re-usability feature through a script instead of through a pipeline-processor… I know it sounds confusing :))))

Closings

Awesome~ We have learnt something new again~ In this blog; we have conquered the following:

  • set values for the document based on the values of existing field(s) — using the “if” clause.
  • a workaround to provide parameters to a pipeline-processor.
  • another workaround to provide parameters to a pipeline — using the script approach.

Good luck and happy data-ingesting :)

--

--

devops terminal

a java / golang / flutter developer, a big data scientist, a father :)