Last modified: 2014-10-07 00:24:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73686, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71686 - stat1001's apache not running (stats.wikimedia.org, datasets.wikimedia.org not available) on 2014-10-05
stat1001's apache not running (stats.wikimedia.org, datasets.wikimedia.org no...
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-05 23:44 UTC by christian
Modified: 2014-10-07 00:24 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-05 23:44:30 UTC
From 2014-10-05's SAL [1]

  20:08 Nemo_bis: 22.03 < Ainali> It was just noticed on svwp village pump that http://stats.wikimedia.org is down


I checked, and apache is currently not running on stat1001 (although it should).
Hence, all it's configured sites are not available.
This includes

  stats.wikimedia.org
  datasets.wikimedia.org


stat1001's dmesg showed 6 messages about limn-reportcard respawning too fast
every 20 minutes (puppet run?) until 2014-10-04 17:45.

Might be that things broke around that time.

Icinga shows CRITICAL for the "puppet last run” service.
(But the service is currently muted. Anyone know why?)


[1] https://wikitech.wikimedia.org/wiki/Server_Admin_Log
Comment 1 christian 2014-10-06 00:07:55 UTC
This ticket needs Ops power. I filed RT #8554 for it.
Comment 2 Nemo 2014-10-06 05:49:59 UTC
https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Miscellaneous+eqiad&h=stat1001.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS showa that at some point 1 GB memory was freed and then traffic dropped.

Hoo concluded that apache2 died and puppet doesn't configure the machine to restart it.
Comment 3 Gerrit Notification Bot 2014-10-06 08:51:18 UTC
Change 164914 had a related patch set uploaded by QChris:
End stats.wikimedia.org certificate in newline

https://gerrit.wikimedia.org/r/164914
Comment 4 Gerrit Notification Bot 2014-10-06 08:55:50 UTC
Change 164914 merged by Filippo Giunchedi:
End stats.wikimedia.org certificate in newline

https://gerrit.wikimedia.org/r/164914
Comment 5 christian 2014-10-06 09:05:51 UTC
godog restarted apache on stat1001.

  https://stats.wikimedia.org/
  https://datasets.wikimedia.org/

are working again.

It seems certificate chaining choked on stats.wikimedia.org's
certificate not ending in a newline. Stop-gap fix is in commet #3.
But godog and _joe_ said this setting should be caught by the
certificate chaining itself, which makes sense.
The RT ticket has been updated accordingly.

Thanks godog and _joe_!
Comment 6 nuria 2014-10-06 23:44:52 UTC
Created attachment 16683 [details]
Page request that match "undefined" from the october sample logs so far.

Page request that match "undefined" from the october sample logs so far.
Comment 7 nuria 2014-10-06 23:47:27 UTC
Please ignore prior assignment.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links