Last modified: 2011-08-08 09:32:48 UTC
Like toolserver's replag bot would get it's data from the api: action=query&meta=siteinfo&siprop=dbrepllag Commands somewhat like: [#wikimedia-tech] <Krinkle>: @replag [#wikimedia-tech] <wmfreplag>: [s1] db26: 6; [s5] db14: 1, db35: 1 [#wikimedia-tech] <Krinkle>: @replag all [#wikimedia-tech] <wmfreplag>: [s1] db36: 0, db32: 0, db12: 0, db26: 0, db38: 0; [s2] db13: 0, db30: 0, db24: 0; [s4] db31: 0, db22: 0, db33: 0; [#wikimedia-tech] <wmfreplag>: [s5] db23: 0, db14: 0, db35: 0; [s6] db29: 0, db21: 0, db7: 0; [s7] db37: 0, db18: 0, db16: 0; [#wikimedia-dev] <Krinkle>: @replag s4 [#wikimedia-dev] <wmfreplag>: [s4] db31: 0, db22: 0, db33: 0 [#wikimedia-dev] <Krinkle>: @replag db36 [#wikimedia-dev] <wmfreplag>: db36: 0 (s1) [#wikimedia-dev] <Krinkle>: @replag commonswiki [#wikimedia-dev] <wmfreplag>: [commonswiki: s4] db31: 0, db22: 0, db33: 0 Info like dbserver-numbers, server-clusternumebrs and wikidb-names will be periodically fetched from: Wikimedia's conf/db.php [1] This is basically a reminder for myself right now, although I haven't started on this yet so anyone who feels like it. Go ahead and assign it to yourself :-) -- Krinkle -- Krinkle [1] http://noc.wikimedia.org/conf/highlight.php?file=db.php http://noc.wikimedia.org/conf/db.php.txt
Can we do this in a saner way for say all, rather than just hitting an API page on each cluster...?
(In reply to comment #1) > Can we do this in a saner way for say all, rather than just hitting an API page > on each cluster...? Based on the info from db.php it would only have to make 1, 2 or 7 http requests depending on the IRC command. Note that this I do not intend to create a bot that warns when replag is too high (in other words, it would not make any requests while idling) - since that is probably something that should be catched serverside and would indicate a larger issue. Although it could ofcourse check 'all' silently once every 15 minutes and report anything out of the ordinary, not that big a deal.
Stupid question (I'm just curious) - if its not going to check repetitively in case things go wrong, whats the use case for knowing the replag? If its big enough to make a difference, I'd imagine that'd fall in the category of something gone wrong.
(In reply to Bawolff comment #3) > Stupid question (I'm just curious) - if its not going to check repetitively in > case things go wrong, whats the use case for knowing the replag? (In reply to Krinkle comment #2) > Although it could ofcourse check 'all' silently once every 15 minutes and > report anything out of the ordinary, not that big a deal. Okay, it *will* check periodically!
(In reply to comment #4) > (In reply to Bawolff comment #3) > > Stupid question (I'm just curious) - if its not going to check repetitively in > > case things go wrong, whats the use case for knowing the replag? > > (In reply to Krinkle comment #2) > > Although it could ofcourse check 'all' silently once every 15 minutes and > > report anything out of the ordinary, not that big a deal. > > Okay, it *will* check periodically! Um, doesn't the nagios bot already report this in channel if it goes too high?
A basic start has been made. Booted it for a test run in #wikimedia-dev, #wikimedia-tech, #wmfDbBot. Account: wmfDbBot Right now it doesn't do the periodic checks and nagging yet. Just on-demand to see if it is wanted or not. Current supported commands: @info <id> @replag <id> id: - cluster: (s1-s7; @info also supports 'DEFAULT') - dbhost (ie. db18) - dbname (ie. enwiki, dewiktionary; @info also supports 'centralauth') "@replag" without arguments will check all hosts and only return those that have a replag higher than 1 second (or alternatively, "No replag"). "@replag all" will check all clusters and return all their dbhosts+lag counts.
(In reply to comment #5) > (In reply to Bawolff comment #3) > > Stupid question (I'm just curious) - if its not going to check repetitively in > > case things go wrong, whats the use case for knowing the replag? > > Um, doesn't the nagios bot already report this in channel if it goes too high? I have never seen it do that. Can someone verify this ?
AFAIK I'm sure it doesn't...
Marking as fixed. It's been running for a while and works nicely. Source code for bot: https://svn.toolserver.org/svnroot/krinkle/trunk/Kribo/ wmf-replag backend + bridge to Kribo-bot: https://svn.toolserver.org/svnroot/krinkle/trunk/Kribo%20(plugins)/wmfDbBot_KriboBridge/