Archive for the ‘ELK’ Category

Courier Fetch Error: unhandled courier request error: Authorization Exception in Chrome/Safari on Kibana 4.5.0

August 22nd, 2016 No comments

Getting this error in your Kibana?

You need to increase your max header size as default netty is only 8KB.   You can change the value in your elasticsearch.yml file.

Add the following line (or uncomment it if it is already there).

http.max_header_size: 32kb


Fixing ‘plugin:elasticsearch [document_already_exists_exception] [config][4.5.1]: document already exists’

June 11th, 2016 No comments

Substitute in the version ‘4.5.1’ with the version you are upgrading to. So far I’ve seen it since Kibana 4.1.x to 4.5.1.

It seem that if you upgrade Kibana, there is a timing bug in how Kibana note its current version. You will get lots of these errors in Kibana logs:

log [08:08:30.649] [error][status][plugin:elasticsearch] Status changed from green to red - [document_already_exists_exception] [config][4.5.1]: document already exists, with: {"shard":"0","index":".kibana"}

These came from me upgrading version 4.5.0 to 4.5.1. I’ve seen same thing when I went from 4.1.4 to 4.5.0.

The fix is to delete the config record in your .kibana index. Don’t worry, it gets recreated again. No loss as far as I know.

curl -XDELETE elasticsearchserver:9200/.kibana/config/4.5.1

The Kibana bug is documented here: kibana issues #5519.

If deleting record does not work, you will also need to refresh your kibana index, e.g. this will flush the data!!!!

curl -XPOST elasticsearchserver:9200/.kibana/_refresh

Categories: Elasticsearch, ELK, Tech Tags: ,

HOW TO add search-guard-ssl to Elasticsearch

March 21st, 2016 1 comment

If you have a need to encrypt communication between your Elasticsearch nodes, but do not (yet) need the complicated ACL provided from either Shield (Elastic commercial product) or Search-Guard (open source), then you can use Search-Guard-SSL (open source).

I am going to show you how to add Search-Guard-SSL (SG-SSL for short) to Elasticsearch. There are a few requirements.

SG-SSL requires Elasticsearch version 2.0.x or newer. Make sure you are using the correct version!

First, download the correct version (zip) file from here.

Second, verify the integrity of your downloaded file.

$ curl -o search-guard-ssl-
$ curl -o search-guard-ssl-

$ gpg --verify search-guard-ssl- search-guard-ssl-

Third, you need to use a cert — either generate your own; or one that you have purchased/generated by your Corp IT — I am not going to go into it here.

Fourth, decide where your trust store and cert are going to reside and configure elasticsearch.yml as appropriate.

Below is just the configuration specific to SG-SSL that need to be added to your elasticsearch.yml. Edit it as appropriate and add it to your Elasticsearch config.

# NOTE: Here, I am only using transport (node to node) encryption.
# I am NOT using HTTP encryption as I want to be able to use the REST API without
# requiring HTTPS. I have HTTP (port 9200) bind to localhost only. You may need to
# turn it on depending on your security policy.
searchh.guard.ssl.transport.enabled: true
searchguard.ssl.transport.keystore_type: PKCS12
searchguard.ssl.transport.keystore_filepath: /export/apps/my-elk-cluster/var/identity.p12
# Alias name (default: first alias which could be found)
#searchguard.ssl.transport.keystore_alias: my_alias
# passwords here are not really in use. Java has a bug where password-less keystores don't work.
searchguard.ssl.transport.keystore_password: my-keystore-password
searchguard.ssl.transport.truststore_type: JKS
searchguard.ssl.transport.truststore_filepath: /etc/pki/certs/cacerts
# Alias name (default: first alias which could be found)
#searchguard.ssl.transport.truststore_alias: my_alias
searchguard.ssl.transport.truststore_password: changeit
searchguard.ssl.transport.truststore_alias: my-alias
searchguard.ssl.transport.enforce_hostname_verification: true
searchguard.ssl.transport.resolve_hostname: true
searchguard.ssl.transport.enable_openssl_if_available: false

# Enable or disable rest layer security - https, (default: false)
searchguard.ssl.http.enabled: false
# JKS or PKCS12 (default: JKS)
#searchguard.ssl.http.keystore_type: PKCS12
# Relative path to the keystore file (this stores the server certificates), must be placed under the config/ d
#searchguard.ssl.http.keystore_filepath: keystore_https_node1.jks
# Alias name (default: first alias which could be found)
#searchguard.ssl.http.keystore_alias: my_alias
# Keystore password (default: changeit)
#searchguard.ssl.http.keystore_password: changeit
# Do the clients (typically the browser or the proxy) have to authenticate themself to the http server, defaul
t is false
#searchguard.ssl.http.enforce_clientauth: false
# JKS or PKCS12 (default: JKS)
#searchguard.ssl.http.truststore_type: PKCS12
# Relative path to the truststore file (this stores the client certificates), must be placed under the config/
#searchguard.ssl.http.truststore_filepath: truststore_https.jks
# Alias name (default: first alias which could be found)
#searchguard.ssl.http.truststore_alias: my_alias
# Truststore password (default: changeit)
#searchguard.ssl.http.truststore_password: changeit
# Use native Open SSL instead of JDK SSL if available (default: true)
searchguard.ssl.http.enable_openssl_if_available: false

That’s it. Now deploy to all your nodes in cluster and the nodes should be communicating over SSL. The above SG-SSL config only turn on SSL for node(transport) but leave REST (HTTP) un-encrypted. This is because I have my ES nodes bind HTTP (9200) to localhost, you have to be able to login to my ES nodes to access the REST port.


ES 2.0 and newer has the JDK security policy manager on by default. This will prevent SG-SSL from reading your truststore and certs if it is not located in ES config directory tree.

You will need to provide your own security policy file to give ES read permission to these files.

Here is how to do that:.

First is that you must tell the JDK you want to use your security policy mapping.


$ cat /export/apps/my-elk-instance/var/java.policy
/* this lets ES mess with a folder in a strange place. */
grant {
permission "/export/apps/my-elk-instance/-", "read";
permission "/etc/pki/certs/*", "read";

More documentation can be found here: modules-scripting-security

Next post will show you have to get Tribe ES node working with SG-SSL.

Categories: Elasticsearch, ELK, Tech Tags: , ,

Kibana 4 with tribe node MasterNotDiscoveredException

December 19th, 2015 No comments

I use tribe nodes quite a lot at $work. It’s how we federate disparate ELK clusters and able to search across them. There are many reasons to have distinct ELK clusters in each data center and/or region.

Some of these are:

1. Elasticsearch does not work well when there is network latencies, which is guaranteed when your nodes are located geographically distant places. You could spend a lot of money to get fast network connection, or you can just have only local clusters. (Me? I pick saving money and avoiding head aches :-)).

2. It can get insanely expensive to create an ES cluster that span data centers/regions. The network bandwidth requirement, the data charges, the care and feeding of such a latency sensitive cluster…. OMG!

3. I don’t really think a 3rd reason is needed.

Although tribe nodes are great for federating ES clusters, there are some quirks in setting them up and caring for them (not as bad as ES clusters that span datacenter though).

One big gotcha for many people who are setting up tribe nodes for the first time is that tribe node can not create index. Tribe can only update, modify an existing index. What this mean is that if you point Kibana at a tribe node, you must first make sure you Kibana index is already created in one of the downstream ES cluster. Otherwise, you will have to create it yourself.

Otherwise, the first time you create an index pattern and tried to save it, you will get an error similar to the subject of this post.


The error message is wrong and misleading. It has nothing to do with Master node. It has everything to do with tribe node not able to create (PUT) a Kibana index.

Personally, I prefer to make the Kibana index that I use with tribe to have its own unique name. So I run a dedicated Kibana instance pointing to the dedicated tribe (client) node.

Here are the steps I do to get a tribe node and its associated Kibana ready for use.

1. Configure the tribe node to know all the ES clusters I want to federate data from.

tribe.elasticsearch.yml: toplevel_tribe ${HOSTNAME}
node.master: false false
  DC1-appservice: logging-DC1
     network.publish_host: ${HOSTNAME}
  DC2-appservice: logging-DC2
     network.publish_host: ${HOSTNAME}
   DC3.....etc to DCNN

  my-es-dedicated-config-cluster: es-config-CORP
     network.publish_host: ${HOSTNAME}

 on_conflict: prefer_my-es-dedicated-config-cluster

2. Now pre-create the Kibana index in my-ES-dedicated-config-cluster. This is a small cluster in my admin/corp data center that is only for housing configurations, Kibana dashboards, etc.

3. A simpler and more correct way is to temporary point Kibana to the dedicated ES cluster (instead of the tribe).

Do this via this setting in your kibana.yml file:

# The Elasticsearch instance to use for all your queries.
elasticsearch.url: “http://ES-node:9200”

Start Kibana, let it create the index.  Then stop it, change the setting back to point to your tribe node.

Doing it this way ensure that your kibana is correct.

curl command for pre-creating kibana (3 and 4) index:

curl -s -XPUT "http://localhost:9200/kibana3-int/" -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 },
"mappings" : { "temp" : { "properties" : { "dashboard" : { "type" : "string" }, "group" : { "type" : "string" }, "title" : { "type" : "string" }, "user" : { "type" : "string" } } }, "dashboard" : { "properties" : { "dashboard" : { "type" : "string" }, "group" : { "type" : "string" }, "title" : { "type" : "string" }, "user" : { "type" : "string" } } } }'

# Kibana4
curl -s -XPUT "http://localhost:9200/TRIBENAME-kibana4" -d '{ "index.mapper.dynamic" : true, "settings" : { "number_of_shards" : 1, "number_of_replicas" : 0 },"mappings" : {"search" : {"_timestamp" : { },"properties" : {"columns" : {"type" : "string"},"description" : {"type" : "string"},"hits" : {"type" : "long"},"kibanaSavedObjectMeta" : {"properties" : {"searchSourceJSON" : {"type" : "string"}}},"sort" : {"type" : "string"},"title" : {"type" : "string"},"version" : {"type" : "long"}}},"dashboard" : {"_timestamp" : { },"properties" : {"description" : {"type" : "string"},"hits" : {"type" : "long"},"kibanaSavedObjectMeta" : {"properties" : {"searchSourceJSON" : {"type" : "string"}}},"optionsJSON" : {"type" : "string"},"panelsJSON" : {"type" : "string"},"timeRestore" : {"type" : "boolean"},"title" : {"type" : "string"},"uiStateJSON" : {"type" : "string"},"version" : {"type" : "long"}}},"visualization" : {"_timestamp" : { },"properties" : {"description" : {"type" : "string"},"kibanaSavedObjectMeta" : {"properties" : {"searchSourceJSON" : {"type" : "string"}}},"savedSearchId" : {"type" : "string"},"title" : {"type" : "string"},"uiStateJSON" : {"type" : "string"},"version" : {"type" : "long"},"visState" : {"type" : "string"}}},"config" : {"_timestamp" : { },"properties" : {"buildNum" : {"type" : "long"},"defaultIndex" : {"type" : "string"}}},"index-pattern" : {"_timestamp" : { },"properties" : {"customFormats" : {"type" : "string"},"fieldFormatMap" : {"type" : "string"},"fields" : {"type" : "string"},"intervalName" : {"type" : "string"},"timeFieldName" : {"type" : "string"},"title" : {"type" : "string"}}}}}'

Elasticsearch util to copy/reindex index(es)

August 30th, 2015 No comments

Elasticsearch (and the entire ELK stack) is pretty useful open source piece of software for analyzing large datasets.   I manage a fairly large ELK infrastructure at work — around 90+ ES clusters, 300+ TB of data.   One of things I’ve found myself having to do is copying and/or reindexing one or more index(es).   Sometime to the same ES cluster, sometime moving index(es) to another cluster.

Regardless, it is just something that is done often enough, but yet in an ad-hoc manner.   It’s not worth setting up logstash config to do this and then tearing them down.

Here is an example logstash config to do something like this.

logstash config:

input {
 elasticsearch {
   hosts => [ "host1", "host2", ..., "hostN" ]
   index => "index"
filter {
output {
 elasticsearch {

This gets old fast when there are many indices. So I wrote a tool to do this in Go. I used the elastic go library from Olivere (

I call it espipe and put it on my Github repo —

You will need to download it, and make sure you have a golang build environment setup. Then change into the source where espipe.go is located and type go build.

If you don’t have golang build environment setup and just want the binary to use, you can d/l  espipe (this is built for linux x86_64).


Simple usage:

$ ./espipe -h
Usage of ./espipe:
  -bulksize int
    	Number of docs to send to ES per chunk (default to 500) (default 500)
  -dst string
    	Destination ES cluster (default to http://localhost:9200) (default "http://localhost:9200")
  -sidx string
    	Source index(es) to copy (default to all '*') (default "logstash*")
  -src string
    	Source ES cluster (default to http://localhost:9200) (default "http://localhost:9200")
  -tidx string
    	Target index to copy (default to 'copyidx') (default "copyidx")

# the following copy all nginx-access-YYYY.MM.DD indices from local cluster to
# anothercluster and consolidated all into one index
$ ./espipe -dst http://localhost:9200 -src http://anothercluster:9200 -sidx 'nginx-access*' -tidx 'nginx-consolidated' -bulksize 1000

Monitoring Postfix and Dovecot logs in ELK

June 19th, 2015 9 comments

postfix-kibana4I’ve been using pflogsumm for the longest time to monitor my postfix logs.   When I used to manage hundreds of domains and many more mailing lists, it was important to keep an eye on my mail servers.

These days, it is just my own personal mail server for my dozens of domains.   I don’t even need to, what with Google and other low cost email services.    It’s for fun and to keep my skills sharp.

Since I have been working with ELK stack a lot lately, I have been wanting to send all my logs — nginx, syslog and postfix maillog — into ELK.  There is already existing grok patterns in logstash for nginx, apache and syslog, but none for postfix.   So I do what I always do, sit down and dived in.

To be clear, I don’t believe in re-inventing the wheel, so I did due diligence and searched for what others have done first.   There were several places that posted their grok recipes for postfix.  But none were exactly plug-n-play for me.   I’ll list them here.

whyscream postfix grok pattern on github

antispin logstash postfix grok patterns

I ended up using a modified version of antispin’s patterns.   I don’t use Amavisd, but I do use Dovecot.   So I added new patterns and modified what was there for my particular installation.

My installation is

  • Fedora 21 (now 23) x86_64
  • Postfix 2.xx
  • Dovecot 2.xx
  • Elasticsearch v1.7.3
  • logstash v1.5.5
  • Kibana 4.1.3.
  • Hardware is:
    • Dell XPS1210 laptop (3.5GB RAM and 250GB HD)
    • ASUS Eee PC 900A (Atom N270, 2GB RAM and 4GB SSD, with 80GB external USB2 drive) – this one run Fedora 21 X86 (32 bit).  Note that I have not seen any problems with mixing 32, 64 bit systems wrt ELK data.

On Fedora, postfix and dovecot logs go to syslogs and end up in /var/log/maillog.

I have logstash installed in /home/logstash. So I added in postfix pattern file in /home/logstash/patterns and called it (what else) postfix.

Also want to say that the site grokdebug really saved me a lot of time and headache.  Use it if you ever have to create new grok patterns!

Here is the content of that file.

# Syslog stuff
COMPONENT ([\w._\/%-]+)
COMPID postfix\/%{COMPONENT:component}(?:\[%{NUMBER:pid}\])?
POSTFIX (?:%{SYSLOGTIMESTAMP:timestamp}|%{TIMESTAMP_ISO8601:timestamp8601}) (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{COMPID}:
# POSTFIX %{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{COMPID}: %{QUEUEID:queueid}
# POSTFIX_MESSAGE %{SYSLOGTIMESTAMP:timestamp} %{IPORHOST:host} %{DATA:program}/%{DATA:subprog}\[%{NUMBER:pid}\]: %{POSTFIX_QUEUEID:queueid}:

# Milter
HELO (?:\[%{IP:helo}\]|%{HOST:helo}|%{DATA:helo})

MILTERCONNECT %{QUEUEID:qid}: milter-reject: CONNECT from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; proto=%{WORD:proto}
MILTERUNKNOWN %{QUEUEID:qid}: milter-reject: UNKNOWN from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; proto=%{WORD:proto}
MILTEREHLO %{QUEUEID:qid}: milter-reject: EHLO from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; proto=%{WORD:proto} helo=<%{HELO}>
MILTERMAIL %{QUEUEID:qid}: milter-reject: MAIL from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; from=<%{EMAILADDRESS:from}> proto=%{WORD:proto} helo=<%{HELO}>
MILTERHELO %{QUEUEID:qid}: milter-reject: HELO from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; proto=%{WORD:proto} helo=<%{HELO}>
MILTERRCPT %{QUEUEID:qid}: milter-reject: RCPT from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; from=<%{EMAILADDRESS:from}> to=<%{EMAILADDRESS:to}> proto=%{WORD:proto} helo=<%{HELO}>
MILTERENDOFMESSAGE %{QUEUEID:qid}: milter-reject: END-OF-MESSAGE from %{RELAY:relay}: %{GREEDYDATA:milter_reason}; from=<%{EMAILADDRESS:from}> to=<%{EMAILADDRESS:to}> proto=%{WORD:proto} helo=<%{HELO}>

# Postfix stuff
EMAILADDRESSPART [a-zA-Z0-9_.+-=:~]+
RELAY (?:%{HOSTNAME:relayhost}(?:\[%{IP:relayip}\](?::[0-9]+(.[0-9]+)?)?)?)
#RELAY (?:%{HOSTNAME:relayhost}(?:\[%{IP:relayip}\](?:%{POSREAL:relayport})))
POSREAL [0-9]+(.[0-9]+)?
STATUS sent|deferred|bounced|expired
PERMERROR 5[0-9]{2}
MESSAGELEVEL reject|warning|error|fatal|panic

POSTFIXACTION discard|dunno|filter|hold|ignore|info|prepend|redirect|replace|reject|warn

# postfix/smtp and postfix/lmtp, postfix/local and postfix/error
# Jun 17 04:41:52 dir postfix/smtp[14434]: CE4FC560C0D: to=, relay=localhost[]:2525, delay=0.32, delays=0.05/0.01/0.19/0.07, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 1B6864661B2F)
POSTFIXSMTPRELAY %{QUEUEID:qid}: to=<%{DATA:to}>,(?:\sorig_to=<%{DATA:orig_to}>,)? relay=%{RELAY},(?: delay=%{POSREAL:delay},)?(?: delays=%{DATA:delays}?,)?(?: conn_use=%{POSREAL:conn_use},)?( %{WORD}=%{DATA},)+? dsn=%{DSN:dsn}, status=%{STATUS:result} %{GREEDYDATA:reason}
POSTFIXSMTP5XX %{QUEUEID:qid}: to=<%{EMAILADDRESS:to}>,(?:\sorig_to=<%{EMAILADDRESS:orig_to}>,)? relay=%{RELAY}, (%{WORD}=%{DATA},)+ dsn=%{DSN:dsn}, status=%{STATUS:result} \(host %{HOSTNAME}\[%{IP}\] said: %{PERMERROR:responsecode} %{DATA:smtp_response} \(in reply to %{DATA:command} command\)\)
POSTFIXSMTPREFUSAL %{QUEUEID:qid}: host %{RELAY} refused to talk to me: %{GREEDYDATA:reason}
POSTFIXSMTPLOSTCONNECTION %{QUEUEID:qid}: lost connection with %{RELAY} while %{GREEDYDATA:reason}
POSTFIXSMTPTIMEOUT %{QUEUEID:qid}: conversation with %{RELAY} timed out while %{GREEDYDATA:reason}

# postfix/smtpd
POSTFIXSMTPDCONNECTS (?:dis)?connect from %{RELAY}
POSTFIXSMTPDACTIONS %{QUEUEID:qid}: %{POSTFIXACTION:postfix_action}: %{DATA:command} from %{RELAY}: %{PERMERROR:responsecode} %{DSN:dsn} %{DATA}: %{DATA:reason}; from=<%{EMAILADDRESS:from}> to=<%{EMAILADDRESS:to}> proto=%{DATA:proto} helo=<%{HELO}>
#POSTFIXSMTPDACTIONS %{QUEUEID:qid}: %{POSTFIXACTION:postfix_action}: %{DATA:command} from %{RELAY}: %{DATA:smtp_response}: %{DATA:reason}; from=<%{EMAILADDRESS:from}> to=<%{EMAILADDRESS:to}> proto=%{DATA:proto} helo=<%{HELO}>
POSTFIXSMTPDTIMEOUTS timeout after %{DATA:command} from %{RELAY}
POSTFIXSMTPDLOGIN %{QUEUEID:qid}: client=%{DATA:client}, sasl_method=%{DATA:saslmethod}, sasl_username=%{GREEDYDATA:saslusername}
POSTFIXSMTPDWARNING warning:( %{IP}: | hostname %{HOSTNAME} )?%{GREEDYDATA:reason}
# Jun  3 16:40:28 dir postfix/smtpd[16526]: improper command pipelining after HELO from[]: QUIT\r\n
POSTFIXSMTPDLOSTCONNECTION (?:lost connection after %{DATA:smtp_response} from %{RELAY}|improper command pipelining after HELO from %{GREEDYDATA:reason})

# postfix/cleanup
POSTFIXCLEANUPMESSAGE %{QUEUEID:qid}: (resent-)?message-id=(<)?%{GREEDYDATA:messageid}(>)?

# postfix/bounce
POSTFIXBOUNCE %{QUEUEID:qid}: sender (non-)?delivery( status)? notification: %{QUEUEID:bouncequeueid}

# postfix/qmgr and postfix/pickup
# Jun 15 14:33:26 dir postfix/qmgr[1282]: 76A5C560C09: from=<>, size=21928, nrcpt=1 (queue active)
POSTFIXQMGR %{QUEUEID:qid}: (?:removed|from=<(?:%{DATA:from})?>(?:, size=%{NUMBER:size}, nrcpt=%{NUMBER:nrcpt} \(%{GREEDYDATA:queuestatus}\))?)

# postfix/anvil
# May 19 19:33:17 dir postfix/scache[8102]: statistics: domain lookup hits=0 miss=1 success=0%
#POSTFIXANVIL statistics:( %{DATA:anvilstatistic})?( for %{DATA:remotehost})?( at )?%{SYSLOGTIMESTAMP:timestamp}
POSTFIXANVIL statistics: %{GREEDYDATA:reason}

# postfix/trivial-rewrite
POSTFIXREWRITE warning: do not list domain %{DATA:domain} in BOTH mydestination and virtual_alias_domains

USER_AGENT User-Agent|X-Mailer
RECIPIENTS <%{EMAILADDRESS:recipient}>(,<%{GREEDYDATA:recipientlist}>)?
ORIGIN (%{DATA:originating_net} )\[%{IP:relay}\](:%{NUMBER}) \[%{IP:originip}\]
AMAVIS %{SYSLOGBASE} \(%{DATA}\) %{WORD:action} %{WORD:ccat} \{%{GREEDYDATA:policybank}\}, %{ORIGIN} <(%{EMAILADDRESS:from})> -> %{GREEDYDATA}, Queue-ID: %{QUEUEID}, Message-ID: <%{DATA:messageid}>%{GREEDYDATA:rest_of_message}

#AMAVISDNEW %{SYSLOGBASE} \(%{DATA:amavisdid}\) %{WORD:action} %{WORD:ccat} %{GREEDYDATA:policybank}, (%{GREEDYDATA:origin_net}) \[%{IP:relayip}\](:%{POSINT}) \[%{IP:originip}\] <(%{EMAILADDRESS:from})?> -> %{RECIPIENTS:recipients}, Queue-ID:%{QUEUEID}, Message-ID: <%{DATA:messageid}>,( mail_id: %{DATA:mail_id},)? Hits: %{NUMBER:hits:float}, size: %{NUMBER:size:int},( queued_as: %{QUEUEID:qid},)? Subject: "%{DATA:subject}", From: %{DATA:from},( %{USER_AGENT}: %{DATA:user_agent},)? Tests: \[%{DATA:TESTS}\],( shortcircuit=%{WORD:shortcircuit},)?( autolearn=%{WORD:autolearn},)? %{POSINT:elapsedtime} ms

#AMAVISDNEW %{SYSLOGBASE} \(%{DATA:amavisdid}\) %{WORD:action} %{WORD:ccat} %{GREEDYDATA:policybank}, \[%{RELAY:relayip}\] \[%{IP:originip}\] <(%{EMAILADDRESS:from})?> -> %{RECIPIENTS:recipients}, Message-ID: <%{DATA:messageid}>,( mail_id: %{DATA:mail_id},)? Hits: %{NUMBER:hits:float}, size: %{NUMBER:size:int},( queued_as: %{QUEUEID:qid},)? Subject: "%{DATA:subject}", From: %{DATA:from},( %{USER_AGENT}: %{DATA:user_agent},)? Tests: \[%{DATA:TESTS}\],( shortcircuit=%{WORD:shortcircuit},)?( autolearn=%{WORD:autolearn},)? %{POSINT:elapsedtime} ms

# Dovecot
# Jun 17 21:30:16 dir dovecot: imap(tin): Disconnected: Logged out in=397 out=45702
# Jun 15 09:26:18 dir dovecot: imap(tin): Connection closed in=352 out=1726
# Jun 19 01:19:29 dir dovecot: imap(pnguyen): Connection closed in=0 out=362
#DOVEID dovecot: %{DATA:component}(?:\(%{DATA:user}\))?(:)?
DOVEIMAP imap\(%{DATA:user}\): %{DATA:reason} in=%{NUMBER:inbytes} out=%{NUMBER:outbytes}

# May 21 21:58:12 dir dovecot: master: Warning: /home/alex is no longer mounted. See
# Jun  5 16:13:31 dir dovecot: anvil: Warning: Killed with signal 15 (by pid=1 uid=0 code=kill)
DOVECMD anvil|auth|config|log|master
# DOVEMISC %{(anvil|auth|config|log|master):command}: %{GREEDYDATA:reason}

DOVELOGIN imap-login: %{DATA:action}:(?: user=<(%{DATA:user})?>, (method=%{DATA:loginmethod}, )?rip=%{IP:rip}, lip=%{IP:lip},( mpid=%{NUMBER:mpid},( %{DATA:sectype},)?| %{DATA:securesession},)? session=<%{DATA:session}>| %{GREEDYDATA:reason})

DOVELDA lda\((%{DATA:user})?\):( %{DATA:action}:)? msgid=(?:<%{DATA:mesgid}@%{DATA:domain}>|%{DATA:mesgid}):( saved mail to| stored mail into mailbox) .*?%{DATA:folder}.*?

DOVEAUTH auth-worker\(%{NUMBER:pid}\): pam\((?:%{USERNAME:user}|%{EMAILADDRESS:user}),%{IP:ip}\): %{GREEDYDATA:reason}




Here is the logstash.conf file, which uses the file input plugin and elasticsearch output plugin, along with the grok filter to make use of our patterns. Note that after analyzing the default mapping of incoming data, I decided to create my own customized template and override the default logstash mapping. You can leave as is, I just happen to want more control over my data mappings. The custom mapping is included below.

input {
  file {
    path => "/var/log/maillog*"
    exclude => "*.gz"
    start_position => "beginning"
    type => "maillog"
filter {
  if [type] == "maillog" {
    grok {
      patterns_dir => ["/home/logstash/config/patterns"]
      match => { "message" => ["%{PF}", "%{DOVECOT}" ] }
    date {
      match => [ "timestamp", "MMM dd HH:mm:ss" ]
  # I wanted to monitor metrics and health of logstash
  metrics {
    meter => "events"
    add_tag => "metric"
output {
  if [type] == "maillog" {
    elasticsearch {
      index => "maillog-%{+YYYY.MM.dd}"
      host => "localhost"
      port => "9200"
      protocol => "http"
      flush_size => 1000
      # the next 4 lines are for explicit index mapping
      manage_template => true
      template_overwrite => true
      template => "/home/logstash/config/templates/maillog.json"
      template_name => "maillog"
  if "metric" in [tags] {
    stdout {
      codec => line {
        format => "rate: %{events.rate_1m}"

My customized mapping.

    "template" : "maillog-*",
    "order" : 1,
    "settings" : {
        "number_of_shards" : 2,
        "index.refresh_interval" : "90s"
    "mappings" : {
        "maillog" : {
            "properties" : {
                "reason" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "saslusername" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "postfix_action" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "relayip" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "messageid" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "pid" : { "index": "not_analyzed", "doc_values": true, "type" : "long" },
                "remote" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "type" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "qid" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "local" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "result" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "path" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "file" : { "index": "not_analyzed", "type" : "string" },
                "queuestatus" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "smtp_response" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "@version" : { "type" : "string" },
                "host" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "client" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "from" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "timestamp" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "nrcpt" : { "index": "not_analyzed", "doc_values": true, "type" : "long" },
                "responsecode" : { "index": "not_analyzed", "doc_values": true, "type" : "long" },
                "offset" : { "index": "not_analyzed", "doc_values": true, "type" : "long" },
                "relayhost" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "logsource" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "message" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "orig_to" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "command" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "tags" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "helo" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "saslmethod" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "component" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "@timestamp" : { "format" : "dateOptionalTime", "type" : "date" },
                "remotehost" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "size" : { "index": "not_analyzed", "doc_values": true, "type" : "long" },
                "anvilstatistic" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "proto" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "bouncequeueid" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "to" : { "index": "not_analyzed", "doc_values": true, "type" : "string" },
                "dsn" : { "index": "not_analyzed", "doc_values": true, "type" : "string" }

ELK Operational Tips

February 22nd, 2015 No comments

I’ve been running ELK clusters for over a year now, and want to share tips and tricks that I’ve found to be useful.

Feel free to post questions and corrections. I’ll try to answer and update when possible.


  • Split brained – this is when you have more than one node in your cluster becoming master.
    • It is best to avoid ever having this happen.   Use the rule of thumb, e.g. if you have N nodes, the number of nodes that can be master is N/2 + 1.   Even better, set aside a dedicated pool of master nodes (I recommend minimum of 3 master capable nodes).
    • If split brained does happen, you want to stop one of the master node ASAP.   Depending on whether you have replicas or not, it could be easy fix, or you might end up having to re-index if your indices has gotten out of sync by having the replica promoted to primary and new index data sent to it.
  • Failed node(s) – one or more failed nodes.  There are many scenarios, from failing hardware to outages causing data corruption, etc.
  • Planned maintenance – several scenarios.
  • Indexing take too long.
  • Recovery take too long.
  • Search/query take too long.




logstash-forwarder TLS handshake errors

July 3rd, 2014 3 comments

I started using logstash-forwarder to send logs from my cloud hosted servers to my ELK server for analysis.   Since it’s just a simple setup, I used the self-gen cert as described on logstash-forwarder’s github page.

Unfortunately, using the example generated a cert that is only good for 30 days.   So suddenly my kibana graph show no data for my cloud servers…. ???  After some digging, I found errors like this in the log.

 logstash-forwarder[4367]: 2014/07/01 23:24:08.559691 Failed to tls handshake with x509: certificate has expired or is not yet valid

openssl x509 -in logstash-forwarder.crt -noout -text  show that the Validity period was only 30 days.  D’oh! 🙂

So I generated a new set, this time for 10 years.  Why not, it’s for my use and if I am still using it 10 years from now…

openssl req -x509 -batch -nodes -newkey rsa:2048 -days 3560 -keyout logstash-forwarder.key -out logstash-forwarder.crt


Update 2014-07-28

Tried to bring up another server with logstash-forwarder.  Except I used latest logstash-forwarder (git pull today 2014/07/25) and started getting this error when starting up LS.

Failed to tls handshake with x509: certificate is valid for , not

After a bit of debugging, comparing certs (exact same MD5 as the ones on working servers), I went googling and bingo!

I see people blaming Go v1.3 TLS changes, but I am still using the same Go v1.2.1 that I built the currently working logstash-forwarder.   And as a matter of fact, copying logstash-forwarder from existing working servers over to the new one and it works just fine!   So I do not think that it’s Go, but something in the latest commits to logstash-forwarder that broke TLS.

 Update 2014-08-17

Turned out to be my self-gen cert ;-P   I created a new one, using properly filled out openssl.cnf and a wildcard domain.  That works fine with latest trunk and built using go v1.2.1.   I’ll update to go v1.3 soon.