Wednesday, August 31, 2011

MongoDB skew monitoring

We use MongoDB at Fotopedia for a variety of things. Last week, one of our mongo got stuck in its master-slave replication process and it took some time for us to detect the issue. Fortunately, mongod servers do expose their internal state regarding the replication and we quickly were able to find what was going on.

As a result, we have improved our Nagios probe to detect large skew (define large how you wish to) in the replication process. At the same time, we have been working on a Munin probe to graph the evolution of the skew across time and see if some pattern emerge.

The probe I'm introducing today is a probe that does both the Nagios monitoring and the Munin reporting:


  • You can use it as a standalone nagios probe

./check_mongo_replica_member myservername.fqdn.com 27100

It will then behave as regular Nagios probe, reporting wrong status in the replica set or reporting WARNING or CRITICAL state if the slave skew is above the WARNING and CRITICAL threshold.


  • You can also link this script to a file name mongolag-{your mongo port number} and use this as a Munin probe by creating the link in the /etc/munin/plugins/ folder:

./mongolag-27100 config
# returns the configuration
./mongolag-27100
# return the Munin data

The munin data can be used to collect the skew and graph that throughout time:




Here is the source code of the probe:
Feel free to fork and improve !

Labels: , ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home