Last modified: 2014-07-22 01:04:14 UTC
I don't know for how many weeks or months this has been broken but the logs are full of failures since at least July 14. Info: Retrieving plugin Error: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9' Error: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9' Wrapped exception: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9' Error: /File[/var/lib/puppet/lib/puppet/parser/functions/floor.rb]/ensure: change from absent to file failed: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9' Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb Info: Loading facts in /var/lib/puppet/lib/facter/physicalcorecount.rb Info: Loading facts in /var/lib/puppet/lib/facter/apt.rb Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb Info: Loading facts in /var/lib/puppet/lib/facter/meminbytes.rb Info: Loading facts in /var/lib/puppet/lib/facter/ec2id.rb Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb Info: Loading facts in /var/lib/puppet/lib/facter/puppet_config_dir.rb Info: Loading facts in /var/lib/puppet/lib/facter/projectgid.rb Info: Caching catalog for i-000003cb.eqiad.wmflabs Error: Could not retrieve catalog from remote server: cannot generate tempfile `/var/lib/puppet/client_data/catalog/i-000003cb.eqiad.wmflabs.json20140719-26226-15hd19i-9' Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run Error: Could not save last run local report: cannot generate tempfile `/var/lib/puppet/state/last_run_summary.yaml20140719-26226-n9zv3g-9'
/var is full. Someone thought it would be a good idea to only allocate 2GB to /var for labs instance. Once Ubuntu is installed there is only a few hundred megabytes free :-/
integration-slave1001.eqiad.wmflabs$ du -h /var/log/diamond 1.1G /var/log/diamond integration-slave1002$ du -h /var/log/diamond 1.2G /var/log/diamond integration-slave1003:~$ du -h /var/log/diamond 1.1G /var/log/diamond Basically diamond logs have never been rotated, the first entry in the log date back from May 22nd.
Cleared out /var/log/diamond/diamond.log on the three slaves + on puppetmaster. We would need a RT ticket to figure out why diamond logs are not logrotated and whether it affects others instances / production.
(In reply to Antoine "hashar" Musso from comment #3) > Cleared out /var/log/diamond/diamond.log on the three slaves + on > puppetmaster. Has that improved the puppet situation? > We would need a RT ticket to figure out why diamond logs are not logrotated > and whether it affects others instances / production. https://rt.wikimedia.org/Ticket/Display.html?id=7945
what is the _latest_ time stamp for these logs. My guess is they are orphaned and can be removed.
Looks like the underlying issue has already been fixed, thanks Chase. https://bugzilla.wikimedia.org/show_bug.cgi?id=66458 Confirmation that puppet is running successfully?
(In reply to Chase from comment #5) > what is the _latest_ time stamp for these logs. My guess is they are > orphaned and can be removed. I haven't looked at the last timestamp. The files were definitely being written too though. We use our own puppetmaster which is rebased manually. YuviPanda commented on bug 66458 that: It does log, but only logs errors. We killed the archive handler that logged all the *metrics* being sent, which was causing the huge log files. So I guess that was fixed by a puppet change. Since most instances were/are broken the fix never landed. I have to verify all instances now.
The logs are smaller now :-) Thank you!