ELK Operational Tips
I’ve been running ELK clusters for over a year now, and want to share tips and tricks that I’ve found to be useful.
Feel free to post questions and corrections. I’ll try to answer and update when possible.
- Split brained – this is when you have more than one node in your cluster becoming master.
- It is best to avoid ever having this happen. Use the rule of thumb, e.g. if you have N nodes, the number of nodes that can be master is N/2 + 1. Even better, set aside a dedicated pool of master nodes (I recommend minimum of 3 master capable nodes).
- If split brained does happen, you want to stop one of the master node ASAP. Depending on whether you have replicas or not, it could be easy fix, or you might end up having to re-index if your indices has gotten out of sync by having the replica promoted to primary and new index data sent to it.
- Failed node(s) – one or more failed nodes. There are many scenarios, from failing hardware to outages causing data corruption, etc.
- Planned maintenance – several scenarios.
- Indexing take too long.
- Recovery take too long.
- Search/query take too long.