Last modified: 2014-07-22 01:04:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70254, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 68254 - Jenkins: Job runner slaves in labs no longer updated by puppet


Summary:	Jenkins: Job runner slaves in labs no longer updated by puppet

Status:	RESOLVED FIXED

Product:	Wikimedia
Classification:	Unclassified
Component:	Continuous integration (Other open bugs)
Version:	wmf-deployment
Hardware:	All All

Importance:	Highest critical (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-07-19 02:03 UTC by Krinkle
Modified:	2014-07-22 01:04 UTC (History)
CC List:	5 users (show)

See Also:	https://rt.wikimedia.org/Ticket/Display.html?id=7945
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Krinkle 2014-07-19 02:03:04 UTC

I don't know for how many weeks or months this has been broken but the logs are full of failures since at least July 14.

Info: Retrieving plugin
Error: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Error: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Wrapped exception:
cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Error: /File[/var/lib/puppet/lib/puppet/parser/functions/floor.rb]/ensure: change from absent to file failed: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/physicalcorecount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/apt.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/meminbytes.rb
Info: Loading facts in /var/lib/puppet/lib/facter/ec2id.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_config_dir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/projectgid.rb
Info: Caching catalog for i-000003cb.eqiad.wmflabs
Error: Could not retrieve catalog from remote server: cannot generate tempfile `/var/lib/puppet/client_data/catalog/i-000003cb.eqiad.wmflabs.json20140719-26226-15hd19i-9'
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not save last run local report: cannot generate tempfile `/var/lib/puppet/state/last_run_summary.yaml20140719-26226-n9zv3g-9'

Comment 1 Antoine "hashar" Musso (WMF) 2014-07-19 19:45:20 UTC

/var is full. Someone thought it would be a good idea to only allocate 2GB to /var for labs instance.  Once Ubuntu is installed there is only a few hundred megabytes free :-/

Comment 2 Antoine "hashar" Musso (WMF) 2014-07-19 19:49:43 UTC

integration-slave1001.eqiad.wmflabs$ du -h /var/log/diamond
1.1G	/var/log/diamond

integration-slave1002$ du -h /var/log/diamond
1.2G	/var/log/diamond

integration-slave1003:~$ du -h /var/log/diamond
1.1G	/var/log/diamond


Basically diamond logs have never been rotated, the first entry in the log date back from May 22nd.

Comment 3 Antoine "hashar" Musso (WMF) 2014-07-19 19:53:29 UTC

Cleared out /var/log/diamond/diamond.log on the three slaves + on puppetmaster.

We would need a RT ticket to figure out why diamond logs are not logrotated and whether it affects others instances / production.

Comment 4 Greg Grossmeier 2014-07-21 21:05:19 UTC

(In reply to Antoine "hashar" Musso from comment #3)
> Cleared out /var/log/diamond/diamond.log on the three slaves + on
> puppetmaster.

Has that improved the puppet situation?

> We would need a RT ticket to figure out why diamond logs are not logrotated
> and whether it affects others instances / production.

https://rt.wikimedia.org/Ticket/Display.html?id=7945

Comment 5 Chase 2014-07-21 22:09:10 UTC

what is the _latest_ time stamp for these logs.  My guess is they are orphaned and can be removed.

Comment 6 Greg Grossmeier 2014-07-21 22:23:22 UTC

Looks like the underlying issue has already been fixed, thanks Chase.

https://bugzilla.wikimedia.org/show_bug.cgi?id=66458

Confirmation that puppet is running successfully?

Comment 7 Antoine "hashar" Musso (WMF) 2014-07-22 00:58:35 UTC

(In reply to Chase from comment #5)
> what is the _latest_ time stamp for these logs.  My guess is they are
> orphaned and can be removed.

I haven't looked at the last timestamp.  The files were definitely being written too though.

We use our own puppetmaster which is rebased manually. YuviPanda commented on bug 66458 that:

 It does log, but only logs errors. We killed the archive handler that logged all the *metrics* being sent, which was causing the huge log files.


So I guess that was fixed by a puppet change.  Since most instances were/are broken the fix never landed.


I have to verify all instances now.

Comment 8 Antoine "hashar" Musso (WMF) 2014-07-22 01:04:14 UTC

The logs are smaller now :-)  Thank you!

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links