Sunday, March 12, 2017

[Python] Deleting files by date from directory with a few million files

Recent problem: Single directory with 5.3 mio files. Using find did not work because it was filling memory up. I tried anything I can think of but in the end had to go with python to iterate over the directory. I used https://github.com/benhoyt/scandir 
Following Code will do the job

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import scandir
import time
import os

old = time.time() - 28 * 24 * 60 * 60 #28 days

for entry in scandir.scandir('/tmp/test'):
 #print(os.path.getctime(os.path.join('/tmp/test',item))) 
 try:
  if (os.path.getctime(os.path.join('/tmp/test',entry.name)) < old):
   os.unlink(os.path.join('/tmp/test',entry.name))
 except:
  pass

This new scandir version is implemented since python 3.5. If you don't have it just download the scandir.py and put it in the same directory as this code.

Thursday, March 2, 2017

[Zabbix] Simple Anomaly / Outlier Detection with Tukey's Range Test

We are using Tukey's Range test to define lower and upper value borders to find outliers in our data. We'll be using a trigger function with those values to get a dynamic trigger that adapts to the data.

The range is defined as [Q1-k(Q3-Q1),Q3+k(Q3-Q1)] . Q3-Q1 = Interquartile Range. We are using the default k factor of 1.5, but you can adjust as wanted, the bigger k the further out the borders.

Example how it can look. In this case the monitored data is the blue graph, the borders red and green. There are 3 outlier events which could trigger an alarm.



Dependencies: datamash, jq, curl

Place following script in /usr/lib/zabbix/externalscripts/ on your zabbix server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash

#use like this: anomalydetection.sh 23432 testhost bupper blower
#script requires datamash, curl and jq

ITEMID=$1 #to be analyzed item
HOST=$2
TARGETITEM1=$3 #upper limit
TARGETITEM2=$4 #lower limit

LIMIT='36' #item refresh every 5min, 36*5=3hour time period
#we want the data from 7 days ago, but the data with an offset defined by the LIMIT*refreshtime

DATE=$(date +%s --date="7 days ago 90 minutes ago")
DATE2=$(date +%s --date="7 days ago 90 minutes")

# CONSTANT VARIABLES
ERROR='0'
ZABBIX_USER='APIUSER' #Make user with API access and put name here
ZABBIX_PASS='xxxxxxxxx' #Make user with API access and put password here
API='https://domain.tld/api_jsonrpc.php'

# Authenticate with Zabbix API
#curl -s -H -k
authenticate() {
echo `curl -k -s -H 'Content-Type: application/json-rpc' -d "{\"jsonrpc\": \"2.0\",\"method\":\"user.login\",\"params\":{\"user\":\""${ZABBIX_USER}"\",\"password\":\""${ZABBIX_PASS}"\"},\"auth\": null,\"id\":0}" $API`
        }
AUTH_TOKEN=`echo $(authenticate)|jq -r .result`
echo `curl -k -s -H 'Content-Type: application/json-rpc' -d "{\"jsonrpc\":\"2.0\",\"method\":\"history.get\",\"params\": {\"output\":\"extend\",\"history\":\"3\",\"itemids\":\"$1\",\"time_from\":\"$DATE\",\"time_till\":\"$DATE2\",\"sortfield\": \"clock\",\"sortorder\": \"DESC\",\"limit\":\"$LIMIT\"},\"auth\":\"$AUTH_TOKEN\",\"id\":1}" $API` | jq -r .result[].value > /tmp/bandvalues

filecontent=( `cat "/tmp/bandvalues" `)
iqr=$(cat /tmp/bandvalues | /usr/bin/datamash iqr 1)
q1=$(cat /tmp/bandvalues | /usr/bin/datamash q1 1)
q3=$(cat /tmp/bandvalues | /usr/bin/datamash q3 1)
k=2.0 #1.5 for standard outliers, 3.0 for far out, adjust as needed
lower=$(echo $q1-$k*$iqr|bc)
upper=$(echo $q3+$k*$iqr|bc)

zabbix_sender -z 127.0.0.1 -p 10051 -s $HOST -k $TARGETITEM1 -o $upper
zabbix_sender -z 127.0.0.1 -p 10051 -s $HOST -k $TARGETITEM2 -o $lower
You need to create 3 items (all of type numeric float!):
1 External Check which calls the script with a key looking like this:
anomalydetection.sh["36014","examplehost","b.upper","b.lower"]  anomalydetection.sh["itemidofmonitoreditem","hostnameofitem","trapperitemupperlimit","trapperitemlowerlimit"]


And 2 items of type Trapper which in my example are upper border and lower border
respective with key b.upper/b.lower

Your trigger definition has to look like this:
{examplehost:itemtobemonitored.last()}<{examplehost:b.lower.last()} or {examplehost:itemtobemonitored.last()}>{examplehost:b.upper.last()}

Fair warning: this only works well with not too volatile data streams