Last modified: 2014-09-04 16:05:54 UTC
http://en.wikipedia.beta.wmflabs.org/wiki/Special:GlobalRenameProgress/ZFilipin_%28WMF%29 had a bunch of jobs queued that were just sitting there...I pushed them through manually just now (24+ hours later) with runJobs.php legoktm@deployment-bastion:~$ mwscript showJobs.php --wiki=enwiki --group refreshLinks: 3 queued; 0 claimed (0 active, 0 abandoned); 0 delayed htmlCacheUpdate: 10 queued; 0 claimed (0 active, 0 abandoned); 0 delayed enotifNotify: 11 queued; 0 claimed (0 active, 0 abandoned); 0 delayed cirrusSearchLinksUpdate: 2 queued; 0 claimed (0 active, 0 abandoned); 0 delayed cirrusSearchLinksUpdatePrioritized: 756 queued; 0 claimed (0 active, 0 abandoned); 0 delayed renameUser: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed updateBetaFeaturesUserCounts: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed ParsoidCacheUpdateJobOnEdit: 715 queued; 0 claimed (0 active, 0 abandoned); 0 delayed ParsoidCacheUpdateJobOnDependencyChange: 1305 queued; 0 claimed (0 active, 0 abandoned); 0 delayed EchoNotificationJob: 1311 queued; 0 claimed (0 active, 0 abandoned); 0 delayed Flow\Jobs\WatchTitle: 1422 queued; 0 claimed (0 active, 0 abandoned); 0 delayed That looks high.
Manually started jobrunner service on deployment-jobrunner01. $ mwscript showJobs.php --wiki=enwiki --group Flow\Jobs\WatchTitle: 0 queued; 1422 claimed (1422 active, 0 abandoned); 0 delayed Lots of log message from the runner like this now: 2014-09-04T01:31:06+0000: Runner loop 0 process in slot 4 gave status '255': nice -19 php /usr/local/apache/common/multiversion/MWScript.php runJobs.php --wiki='incubatorwiki' --type='LocalRenameUserJob' --maxtime='60' --memory-limit='300M' --result=json /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `incubatorwiki`.
$ tail -5000 /var/log/mediawiki/jobrunner.log|grep 'Fatal error'|grep 'has no version entry'|awk '{print $9}'|sort|uniq -c 52 `afwikibooks`. 51 `afwikiquote`. 54 `afwiktionary`. 52 `akwiki`. 49 `alswiki`. 37 `alswikibooks`. 46 `alswiktionary`. 43 `amwiki`. 48 `amwikiquote`. 64 `arwikibooks`. 51 `arwikiquote`. 40 `arwikiversity`. 66 `incubatorwiki`. 38 `mkwikibooks`. 54 `nlwiki`. 46 `nlwikibooks`. 43 `nlwikiquote`.
Tried to clean up bad jobs manually: redis 127.0.0.1:6379> keys *:jobqueue:LocalRenameUserJob:l-* 1) "nlwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed" 2) "amwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed" 3) "arwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed" 4) "amwiki:jobqueue:LocalRenameUserJob:l-unclaimed" 5) "afwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed" 6) "arwikiversity:jobqueue:LocalRenameUserJob:l-unclaimed" 7) "akwiki:jobqueue:LocalRenameUserJob:l-unclaimed" 8) "nlwiki:jobqueue:LocalRenameUserJob:l-unclaimed" 9) "nlwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed" 10) "afwiktionary:jobqueue:LocalRenameUserJob:l-unclaimed" 11) "alswiktionary:jobqueue:LocalRenameUserJob:l-unclaimed" 12) "arwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed" 13) "incubatorwiki:jobqueue:LocalRenameUserJob:l-unclaimed" 14) "afwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed" 15) "mkwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed" 16) "alswikibooks:jobqueue:LocalRenameUserJob:l-unclaimed" 17) "alswiki:jobqueue:LocalRenameUserJob:l-unclaimed" redis 127.0.0.1:6379> del "nlwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed" ... redis 127.0.0.1:6379> del "alswiki:jobqueue:LocalRenameUserJob:l-unclaimed" redis 127.0.0.1:6379> keys *:jobqueue:LocalRenameUserJob:l-* (empty list or set) redis 127.0.0.1:6379> save OK But deployment-jobrunner01 is still seeing them? 2014-09-04T15:53:41+0000: Runner loop 0 process in slot 4 gave status '255': nice -19 php /usr/local/apache/common/multiversion/MWScript.php runJobs.php --wiki='incubatorwiki' --type='LocalRenameUserJob' --maxtime='60' --memory-limit='300M' --result=json /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `incubatorwiki`. Fatal error: /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `incubatorwiki`. in /srv/common-local/multiversion/MWMultiVersion.php on line 358
W00t figured it out. There is a special hash for the new jobrunner that tracks what queues to try and process: redis 127.0.0.1:6379> hkeys "jobqueue:aggregator:h-ready-queues:v2" 1) "webVideoTranscode/commonswiki" 2) "LocalRenameUserJob/afwikiquote" 3) "LocalRenameUserJob/afwiktionary" 4) "LocalRenameUserJob/akwiki" 5) "LocalRenameUserJob/alswiki" 6) "LocalRenameUserJob/alswikibooks" 7) "LocalRenameUserJob/alswiktionary" 8) "LocalRenameUserJob/amwiki" 9) "LocalRenameUserJob/amwikiquote" 10) "LocalRenameUserJob/arwikibooks" 11) "LocalRenameUserJob/arwikiquote" 12) "LocalRenameUserJob/arwikiversity" 13) "LocalRenameUserJob/incubatorwiki" 14) "LocalRenameUserJob/mkwikibooks" 15) "LocalRenameUserJob/nlwiki" 16) "LocalRenameUserJob/nlwikibooks" 17) "LocalRenameUserJob/nlwikiquote" 18) "gwtoolsetUploadMediafileJob/commonswiki" 19) "gwtoolsetUploadMetadataJob/commonswiki" 20) "cirrusSearchLinksUpdate/commonswiki" 21) "globalUsageCachePurge/commonswiki" 22) "cirrusSearchLinksUpdatePrioritized/enwiki" redis 127.0.0.1:6379> hdel "jobqueue:aggregator:h-ready-queues:v2" "LocalRenameUserJob/afwikiquote" ... (integer) 16 Restarted runner on deployment-jobrunner01 and log is not filling with junk now.