Thursday, September 7, 2017

[MySQL] Debian 8 Jessie - How to bootstrap a new Galera Cluster

Despite the docs saying to use "systemctl start mysql --wsrep-new-cluster" or "service start mysql --wsrep-new-cluster", the only working way is: service mysql bootstrap

Wednesday, June 21, 2017

[Logstash] How to process Amavis logs via filebeat and logstash for elasticsearch

First activate JSON logging for amavis by following this:

in rsyslog.conf add:
$MaxMessageSize 32k
in amavisd.conf add: 
$logline_maxlen = ( 32*1024 ) - 50; # 32k max message size, keep 50 bytes for syslog
$log_templ = <<'EOD';
[:report_json]
EOD

Now the thing is amavis logs via syslog to file so the line structure is syslog + json message.
For Filebeat add the following to your config:

  fields:
        tags: ['json']

  fields_under_root: true

Now you do not need the json tag per se but I am using it because I got json and non-json logs. (can't use hilite me, will break page :D in case you have issues use this https://pastebin.com/raw/DtRJ5mDX )
filter
{
if "json" in [tags] {

grok {
    match => { "message" => "%{SYSLOGTIMESTAMP:logtime} %{SYSLOGHOST} %{DATA:program}(?:\[%{POSINT}\])?: (?:\((.*?)\))? %{GREEDYDATA:message}" }
    overwrite => "message"
}
}
}
Now in your logstash config you need the following to process the message part properly and use the json as source:

You can add the following to it or keep it in a separate config but what you need then is:

    if "json" in [tags] { 
    json {
      source => "message"
       }   
    }

Tuesday, June 20, 2017

[Logstash] How to match @timestamp with syslog timestamp

If you have a time/date field in your data, e.g. syslog time and you want to match @timestamp with it you do this as the following:
Add to filter {}

date {
    match => [ "logtime", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ] #syslog pattern
    target => "@timestamp"
    timezone => "UTC" #set the timezone to the one your logs are originally in
}

The 'logtime' field is my syslog date field.
Logstash Version: 5.4.1-1

Tuesday, June 6, 2017

[Ejabberd] log extraction from mysql to email

Log extraction script for ejabberd logs from mysql.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
import MySQLdb
import sys
import collections
import smtplib
from email.mime.text import MIMEText

# open a database connection
# be sure to change the host IP address, username, password and database name to match your own
connection = MySQLdb.connect (host = "localhost", user = "root", passwd = "DBPASSWORD", db = "ejabberd")
# prepare a cursor object using cursor() method
cursor = connection.cursor ()
# execute the SQL query using execute() method.
cursor.execute ("select created_at,username,bare_peer,txt from archive where (created_at BETWEEN NOW() - INTERVAL 1 DAY AND NOW())")
#cursor.execute ("select created_at,username,bare_peer,txt from archive where (created_at BETWEEN '2017-05-30 13:00:00' AND '2017-05-30 14:00:00')")
# fetch all of the rows from the query
data = cursor.fetchall ()
conversations = collections.defaultdict(list) 
raw = collections.defaultdict(list)
#check if convo exists, if AB in conversations()
#potential issue with morning
rangeswitch=0 #use xrange step 2
if data[2][3]==data[3][3]: #if third and fourth messsage are equal xrange step 2
 rangeswitch=2
elif data[0][3]==data[2][3]:
 rangeswitch=0

#for each recipient read the sender and copy the list entry into the value list of the sender, in the end sort by timestampLq
#key=recipient, value=all messages to him in a list
for item in xrange(0,len(data),rangeswitch):  
 recipient = data[item][2]
 raw[recipient].append(data[item]) 
#iterate over all recipients messages, add the message to each sender list too, in the end sort by timestamp and extract
convokey=None
for recipient in raw:
    for message in xrange(0,len(raw[recipient])):
 sender=str(raw[recipient][message][1])
 if (recipient.split("@")[0]+sender) in conversations:
  convokey=recipient.split("@")[0]+sender
                conversations[convokey].append(raw[recipient][message])
 elif (sender+recipient.split("@")[0]) in conversations:
  convokey=sender+recipient.split("@")[0]
                conversations[convokey].append(raw[recipient][message])
 else:
  convokey=recipient.split("@")[0]+sender
  conversations[convokey].append(raw[recipient][message])
#sort by timestamp all convos
for user, messages in conversations.items(): messages.sort(key=lambda tup: tup[0]) 
#send by email

me = "root@localhost"

# Create message container - the correct MIME type is multipart/alternative.
text = ""
for key in conversations:
 print(key)

for conversation in conversations:
 for message in xrange(0,len(conversations[conversation])):
  text=text+str(conversations[conversation][message][0])+' '+str(conversations[conversation][message][1])+' '+str(conversations[conversation][message][2])+' '+str(conversations[conversation][message][3])+'\n'
 msg = MIMEText(text, 'plain')
 for line in xrange(0,len(conversations[conversation])):
  if str(conversations[conversation][line][2]) != str(conversations[conversation][line+1][2]):
   rec1=str(conversations[conversation][line][2])
   rec2=str(conversations[conversation][line+1][2])
   break
 msg['Subject'] = "Chat log between"+' '+rec1.split("@")[0]+' '+'and'+' '+rec2.split("@")[0]
        msg['From'] = me
 msg['To'] = rec1+','+rec2
 s = smtplib.SMTP('localhost')
 s.sendmail(me, 'your@mailserver.com', msg.as_string())
 text=""
 break #remove after testing
s.quit()

# close the cursor object
cursor.close ()
# close the connection
connection.close ()
# exit the program
sys.exit()

Tuesday, April 4, 2017

[Python] How to anonymize logs - removing email addresses and replacing them with SHA1 hashes

This script will remove all email addresses and replace them with a SHA1 hash



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import hashlib
import re
thefile = open('/var/log/z-push/z-push-hashed.log', 'w')
with open('/var/log/z-push/z-push.log') as f:
    content = f.readlines()
content = [x.strip() for x in content]

for x in xrange(0,len(content)):
        mail=re.findall(r'[\w\.-]+@[\w\.-]+', content[x])
        if not mail:
                thefile.write(content[x])
                continue
        teststring=re.sub(str(mail[0]),hashlib.sha1(str(mail[0])).hexdigest(),content[x])
        thefile.write(teststring+'\n')

Sunday, March 12, 2017

[Python] Deleting files by date from directory with a few million files

Recent problem: Single directory with 5.3 mio files. Using find did not work because it was filling memory up. I tried anything I can think of but in the end had to go with python to iterate over the directory. I used https://github.com/benhoyt/scandir 
Following Code will do the job

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import scandir
import time
import os

old = time.time() - 28 * 24 * 60 * 60 #28 days

for entry in scandir.scandir('/tmp/test'):
 #print(os.path.getctime(os.path.join('/tmp/test',item))) 
 try:
  if (os.path.getctime(os.path.join('/tmp/test',entry.name)) < old):
   os.unlink(os.path.join('/tmp/test',entry.name))
 except:
  pass

This new scandir version is implemented since python 3.5. If you don't have it just download the scandir.py and put it in the same directory as this code.

Thursday, March 2, 2017

[Zabbix] Simple Anomaly / Outlier Detection with Tukey's Range Test

We are using Tukey's Range test to define lower and upper value borders to find outliers in our data. We'll be using a trigger function with those values to get a dynamic trigger that adapts to the data.

The range is defined as [Q1-k(Q3-Q1),Q3+k(Q3-Q1)] . Q3-Q1 = Interquartile Range. We are using the default k factor of 1.5, but you can adjust as wanted, the bigger k the further out the borders.

Example how it can look. In this case the monitored data is the blue graph, the borders red and green. There are 3 outlier events which could trigger an alarm.



Dependencies: datamash, jq, curl

Place following script in /usr/lib/zabbix/externalscripts/ on your zabbix server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash

#use like this: anomalydetection.sh 23432 testhost bupper blower
#script requires datamash, curl and jq

ITEMID=$1 #to be analyzed item
HOST=$2
TARGETITEM1=$3 #upper limit
TARGETITEM2=$4 #lower limit

LIMIT='36' #item refresh every 5min, 36*5=3hour time period
#we want the data from 7 days ago, but the data with an offset defined by the LIMIT*refreshtime

DATE=$(date +%s --date="7 days ago 90 minutes ago")
DATE2=$(date +%s --date="7 days ago 90 minutes")

# CONSTANT VARIABLES
ERROR='0'
ZABBIX_USER='APIUSER' #Make user with API access and put name here
ZABBIX_PASS='xxxxxxxxx' #Make user with API access and put password here
API='https://domain.tld/api_jsonrpc.php'

# Authenticate with Zabbix API
#curl -s -H -k
authenticate() {
echo `curl -k -s -H 'Content-Type: application/json-rpc' -d "{\"jsonrpc\": \"2.0\",\"method\":\"user.login\",\"params\":{\"user\":\""${ZABBIX_USER}"\",\"password\":\""${ZABBIX_PASS}"\"},\"auth\": null,\"id\":0}" $API`
        }
AUTH_TOKEN=`echo $(authenticate)|jq -r .result`
echo `curl -k -s -H 'Content-Type: application/json-rpc' -d "{\"jsonrpc\":\"2.0\",\"method\":\"history.get\",\"params\": {\"output\":\"extend\",\"history\":\"3\",\"itemids\":\"$1\",\"time_from\":\"$DATE\",\"time_till\":\"$DATE2\",\"sortfield\": \"clock\",\"sortorder\": \"DESC\",\"limit\":\"$LIMIT\"},\"auth\":\"$AUTH_TOKEN\",\"id\":1}" $API` | jq -r .result[].value > /tmp/bandvalues

filecontent=( `cat "/tmp/bandvalues" `)
iqr=$(cat /tmp/bandvalues | /usr/bin/datamash iqr 1)
q1=$(cat /tmp/bandvalues | /usr/bin/datamash q1 1)
q3=$(cat /tmp/bandvalues | /usr/bin/datamash q3 1)
k=2.0 #1.5 for standard outliers, 3.0 for far out, adjust as needed
lower=$(echo $q1-$k*$iqr|bc)
upper=$(echo $q3+$k*$iqr|bc)

zabbix_sender -z 127.0.0.1 -p 10051 -s $HOST -k $TARGETITEM1 -o $upper
zabbix_sender -z 127.0.0.1 -p 10051 -s $HOST -k $TARGETITEM2 -o $lower
You need to create 3 items (all of type numeric float!):
1 External Check which calls the script with a key looking like this:
anomalydetection.sh["36014","examplehost","b.upper","b.lower"]  anomalydetection.sh["itemidofmonitoreditem","hostnameofitem","trapperitemupperlimit","trapperitemlowerlimit"]


And 2 items of type Trapper which in my example are upper border and lower border
respective with key b.upper/b.lower

Your trigger definition has to look like this:
{examplehost:itemtobemonitored.last()}<{examplehost:b.lower.last()} or {examplehost:itemtobemonitored.last()}>{examplehost:b.upper.last()}

Fair warning: this only works well with not too volatile data streams