Thursday, March 2, 2017

[Zabbix] Simple Anomaly / Outlier Detection with Tukey's Range Test

We are using Tukey's Range test to define lower and upper value borders to find outliers in our data. We'll be using a trigger function with those values to get a dynamic trigger that adapts to the data.

The range is defined as [Q1-k(Q3-Q1),Q3+k(Q3-Q1)] . Q3-Q1 = Interquartile Range. We are using the default k factor of 1.5, but you can adjust as wanted, the bigger k the further out the borders.

Example how it can look. In this case the monitored data is the blue graph, the borders red and green. There are 3 outlier events which could trigger an alarm.



Dependencies: datamash, jq, curl

Place following script in /usr/lib/zabbix/externalscripts/ on your zabbix server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash

#use like this: anomalydetection.sh 23432 testhost bupper blower
#script requires datamash, curl and jq

ITEMID=$1 #to be analyzed item
HOST=$2
TARGETITEM1=$3 #upper limit
TARGETITEM2=$4 #lower limit

LIMIT='36' #item refresh every 5min, 36*5=3hour time period
#we want the data from 7 days ago, but the data with an offset defined by the LIMIT*refreshtime

DATE=$(date +%s --date="7 days ago 90 minutes ago")
DATE2=$(date +%s --date="7 days ago 90 minutes")

# CONSTANT VARIABLES
ERROR='0'
ZABBIX_USER='APIUSER' #Make user with API access and put name here
ZABBIX_PASS='xxxxxxxxx' #Make user with API access and put password here
API='https://domain.tld/api_jsonrpc.php'

# Authenticate with Zabbix API
#curl -s -H -k
authenticate() {
echo `curl -k -s -H 'Content-Type: application/json-rpc' -d "{\"jsonrpc\": \"2.0\",\"method\":\"user.login\",\"params\":{\"user\":\""${ZABBIX_USER}"\",\"password\":\""${ZABBIX_PASS}"\"},\"auth\": null,\"id\":0}" $API`
        }
AUTH_TOKEN=`echo $(authenticate)|jq -r .result`
echo `curl -k -s -H 'Content-Type: application/json-rpc' -d "{\"jsonrpc\":\"2.0\",\"method\":\"history.get\",\"params\": {\"output\":\"extend\",\"history\":\"3\",\"itemids\":\"$1\",\"time_from\":\"$DATE\",\"time_till\":\"$DATE2\",\"sortfield\": \"clock\",\"sortorder\": \"DESC\",\"limit\":\"$LIMIT\"},\"auth\":\"$AUTH_TOKEN\",\"id\":1}" $API` | jq -r .result[].value > /tmp/bandvalues

filecontent=( `cat "/tmp/bandvalues" `)
iqr=$(cat /tmp/bandvalues | /usr/bin/datamash iqr 1)
q1=$(cat /tmp/bandvalues | /usr/bin/datamash q1 1)
q3=$(cat /tmp/bandvalues | /usr/bin/datamash q3 1)
k=2.0 #1.5 for standard outliers, 3.0 for far out, adjust as needed
lower=$(echo $q1-$k*$iqr|bc)
upper=$(echo $q3+$k*$iqr|bc)

zabbix_sender -z 127.0.0.1 -p 10051 -s $HOST -k $TARGETITEM1 -o $upper
zabbix_sender -z 127.0.0.1 -p 10051 -s $HOST -k $TARGETITEM2 -o $lower
You need to create 3 items (all of type numeric float!):
1 External Check which calls the script with a key looking like this:
anomalydetection.sh["36014","examplehost","b.upper","b.lower"]  anomalydetection.sh["itemidofmonitoreditem","hostnameofitem","trapperitemupperlimit","trapperitemlowerlimit"]


And 2 items of type Trapper which in my example are upper border and lower border
respective with key b.upper/b.lower

Your trigger definition has to look like this:
{examplehost:itemtobemonitored.last()}<{examplehost:b.lower.last()} or {examplehost:itemtobemonitored.last()}>{examplehost:b.upper.last()}

Fair warning: this only works well with not too volatile data streams

No comments:

Post a Comment