creating notifications through Alerting / Watcher X-Pack plugin on Elastic Cloud — Part 1

X-Pack Watcher / Alerting (remember this pair of eyes? :) )

If you are new to Elastic Cloud, do read my previous blog “using Elastic Cloud as your data repository and how to make your 1st data ingestion through filebeat

This is the 1st blog on the Watcher / Alerting series, we would be focusing on the following:

  • Kibana UI for creating a Threshold watch AND
  • an Advanced watch

On the next blog, we would cover the rest of the watch types including SIMPLE, SEARCH, HTTP and CHAIN, and a short intro on some of the ACTIONS we could trigger as notification. Stay tuned :D

what is Watcher / Alerting?

from the official documentation:

Watcher is an Elasticsearch feature that you can use to create actions based on conditions, which are periodically evaluated using queries on your data. Watches are helpful for analyzing mission-critical and business-critical streaming data. For example, you might watch application logs for performance outages or audit access logs for security threats.

In a simple sentence, we could setup thresholds / indicators in which when data meeting these criterium, a corresponding action would be triggered to notify the right person or even the right system.

PS. Since we manually setup the thresholds / indicators, we might need to verify these values from time to time.

the watcher UI

Let’s start with the quickest way to setup a watch through the kibana’s watcher UI.

Access the UI through: management → Elasticsearch section → watcher

For the 1st time entering the page, you would see something like this:

threshold or advanced watch?

We are provided 2 choices: threshold OR advanced watch; for simplicity, let’s start with “threshold” watch.

threshold watch page

The above is the threshold watch creation page. Point 1 is the name of your watch (note that this name is for human recognition only, at the back, Elasticsearch will generate a unique UUID for your watch instead, we will discover this a bit later). Point 2 is where this watch is looking for data, in our case, we should have some data indices available in the cluster (no? then you check out my previous blog or have a look at this github link :) )

Point 3 is the timestamp field of the chosen index(s), note that if you do not have a timestamp, it won’t even work; therefore you would need to add back a missing timestamp field, again do have a look at this github link — the _add_ts pipeline would add back the current timestamp to each document.

Point 4 is the triggering schedule, in this case the default interval is per minute. Point 5 would be the conditions to fire an action. In Watcher, ACTION means the notification mechanism we need to carry out if the above condition is met; what we could do inside an Action could be logging the message to another index, logging the message to console logs, email the message out or even making an HTTP call to a remote server (technically a WebHook).

Let’s dive into the conditions and actions part a bit:

options available

Conditions are broken into several pieces here:

  • when: is the metrics to decide your data matches a condition, hence count, average, sum etc are available
  • over: is describing how to get a value for evaluation on the above metric; available options are overall (means all data without any filtering or grouping) or top (means top n VALUES of a field; where each VALUE contains its own count. For example, top 3 country names, so we might end up US: 1000, CN: 780, IN: 680 etc)
  • is above: is the number threshold / indicator, for example the count of documents over / above 10000, an action should be carried out (like email somebody). Available options are: above, above or equals, below, below or equals and is-between.
  • for the last: refers to the time range to collect the value for evaluation, it makes sense that we are talking about data within the last n minutes or hours instead of every data since day 1; so this is where we setup the query range criteria — default is past 5 minutes.

Next are the actions to trigger:

actions

The watcher UI provides a subset of available actions for us (it is quite handy for most use cases), we can see an email, logging, slack, webhook, index, pagerduty and Jira action. For simplicity, we will pick the logging action which logs a message to our server logs.

logging action

Logging action lets us design a message template; we can even test it out on clicking the “Log a sample message”

Now shift to the Elastic Cloud portal and locate your deployment, under it you should be able to see the Logs section:

Deployment → Elasticsearch → Logs

Our sample Log is visible at the top part of the list~ Great, it works!

Now shift back to the watcher UI, and click “create alert” button… all set. All we need to do now is to wait for roughly a minute and see if any logging action would be triggered.

The page would be refreshed automatically to reflect the changes. Point 1 is the watch we just created, note that we have the UUID generated plus our “test_only” value is under the name field instead. Point 2 refers to the last / current state of our watch action, as long as it is not “failed” that indicates the action has been successfully executed. Click on the watch id and you would be brought to the histories page:

Point 1 shows the history of executions plus their triggered timestamps; alright all seems well and valid, but how to validate? Just go back to our Elastic Cloud Log UI and you should be able to spot out a few entries of the logged message.

spot that indeed the log is produced per minute interval (as configured) :)

Housekeep issues

Congratulations~ You have just created a simple threshold watch using the Watcher UI. This 1st trial isn’t really that useful to us, plus every minute creating a log is not a good thing too. Let’s do some housekeeping, going back to the watcher UI, click on the targeted watch (for our case, the test_only watch), on the top right hand side of the history page, we can now deactivate or even delete the watch. I suggest to deactivate in general, since you never know when you would need to trigger the watch again. And That’s it~ So easy!

the dreaded Advance Watch…

Hey! Are you ready for some upgraded challenges? Let’s try out the Advance watch, shall we? (no? sorry… no choice for you, let’s go :) )

Back to the watcher UI, click “create” button on the top right corner and pick the “create advance watch”… and TADAAAAAA~

Surprise!!! Isn’t it simple and elegant? Woo no more fancy UI and a good old textbox — that’s what most developers need :) Yep, that’s the advance watch UI. Point 1 is the human recognisable name of the watch and a generated ID. Point 2 is where most of the watch’s components are — including the trigger schedule, conditions and actions. Since you would need to code these things… Point 3 will assist you by showing the syntax documentations (no jokes, documentations do help a lot here)

the above is what I typed in, you can see that we have several sections here:

  • trigger: the triggering schedule which is set to 1 minute
  • input: the ways to get back data for evaluation, yep… it is bulky and you can guess I am trying to do a match_all query and get back all data from the imdb_movie index
  • condition: defining when an action should be triggered, in this case, if the total number of documents is over “2”, trigger the actions
  • actions: defining the notification actions if the above condition is met; again in this case, just a simple log to our cluster

Save and create it and again… wait for a minute and check if the corresponding action is executed correctly~

Great! We did it again, no sweat at all isn’t it? It is quite challenging isn’t it? So that’s why there is an easter egg on the UI… “simulate” tab. Simulate tab helps us to simulate and let us have a chance to debug quickly on the logics

1st!!! We would need to finish the watch contents before simulation could be done, make sense right? Now maybe our trigger scheduled at per minute interval, for debugging purpose, we probably want this triggering interval to be shorter, hence override the “trigger” section, in this case, hard set it to every 5 seconds instead of per minute. Conditions — we all know that not every minute would trigger some actions, only those who met the conditions would fire the actions; hence we could simply check the “ignore condition” slider and that means… actions would trigger at any situation. Pick the action(s) we would want to simulate (in this case the logging action). If you want to test the watch on specific use cases (eg. different data category would trigger a different logging message), you could provide your own document contents for this simulation!

Finally, click “Simulate Watch” and all good, remember this is only a simulation, hence no REAL logging would be applied to our cluster~~

We could verify the results base on the json response, yep… it is not perfect, but still handy for the mean time :)

Conclusion

Cool~ We have gone through some journey here:

  • Using watcher UI to create a threshold watch
  • also created an advanced watch
  • we also have an idea where to read our logs on Elastic Cloud

On the coming blog, we would use the “dev tools” to write more advanced watches, UI is good, but sometimes a textbox is more efficient :))))

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
devops terminal

devops terminal

114 Followers

a java / golang / flutter developer, a big data scientist, a father :)