MongoDB skew monitoring
We use MongoDB at Fotopedia for a variety of things. Last week, one of our mongo got stuck in its master-slave replication process and it took some time for us to detect the issue. Fortunately, mongod servers do expose their internal state regarding the replication and we quickly were able to find what was going on.
As a result, we have improved our Nagios probe to detect large skew (define large how you wish to) in the replication process. At the same time, we have been working on a Munin probe to graph the evolution of the skew across time and see if some pattern emerge.
The probe I'm introducing today is a probe that does both the Nagios monitoring and the Munin reporting:
It will then behave as regular Nagios probe, reporting wrong status in the replica set or reporting WARNING or CRITICAL state if the slave skew is above the WARNING and CRITICAL threshold.
The munin data can be used to collect the skew and graph that throughout time:
Here is the source code of the probe:
Feel free to fork and improve !
As a result, we have improved our Nagios probe to detect large skew (define large how you wish to) in the replication process. At the same time, we have been working on a Munin probe to graph the evolution of the skew across time and see if some pattern emerge.
The probe I'm introducing today is a probe that does both the Nagios monitoring and the Munin reporting:
- You can use it as a standalone nagios probe
./check_mongo_replica_member myservername.fqdn.com 27100
It will then behave as regular Nagios probe, reporting wrong status in the replica set or reporting WARNING or CRITICAL state if the slave skew is above the WARNING and CRITICAL threshold.
- You can also link this script to a file name mongolag-{your mongo port number} and use this as a Munin probe by creating the link in the /etc/munin/plugins/ folder:
./mongolag-27100 config
# returns the configuration
./mongolag-27100
# return the Munin data
The munin data can be used to collect the skew and graph that throughout time:
Here is the source code of the probe:
Feel free to fork and improve !


0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home