Last modified: 2010-02-02 10:26:43 UTC
Test case attached. Steps to reproduce: 1. Open eval.php, and create an OggHandler object. 2. Set your memory limit below 50M 3. Call $OggHandler->getMetadata( null, '/path/to/test/case' ); Result: PHP dies with OOM. This is occurring on Wikimedia sites for *some* files with uncached metadata. I did some research and debugging, and it always seems to die in _decodePageHeader, in File/Ogg.php. It seems to try and list the streams (which, in theory there should only be 5 or 6 of), storing the data as it goes. It then runs through the streams to generate aggregate data. Using COUNT_RECURSIVE and no memory_limit, I counted the number of pieces of stream information stored in _streamList for the test case, and for the featured media of the day, which happened to be [[File:Eichmann_trial_news_story.ogg]] > $h = new OggHandler; $m = $h->getMetadata( null, '/Users/andrew/En-The_Raven-wikisource.ogg' ) Class File_Ogg not found; skipped loading Memory used: 50356180 Size of _streamList is 398175 > $h = new OggHandler; $m = $h->getMetadata( null, '/Users/andrew/Eichmann_trial_news_story.ogg' ); Class File_Ogg not found; skipped loading Memory used: 7901476 Size of _streamList is 10662 RECOMMENDED RESOLUTION: It makes the most sense to resolve this by aggregating whatever data is needed to be aggregated as the stream list is generated, rather than at the end.
Adding the attachment failed. The test case is available at http://en.wikisource.org/wiki/Media:En-The_Raven-wikisource.ogg
OOM can also happen within exif_read_data for jpegs with lengthy exif data.
*** Bug 19870 has been marked as a duplicate of this bug. ***
*** Bug 20801 has been marked as a duplicate of this bug. ***
Bumping this up from an enhancement...
*** Bug 20811 has been marked as a duplicate of this bug. ***
I'm still experiencing the same problem described in bug 20811, also with a DjVu file (it's 40 MB, this one: http://www.archive.org/details/VocabolarioAccademiciCruscaEdi3Vol3).
(In reply to comment #4) > *** Bug 20801 has been marked as a duplicate of this bug. *** > On this bug, note that even Special:WhatLinksHere/File:... fails: http://meta.wikimedia.org/wiki/Special:WhatLinksHere/Image:Screencast_-_Spam_blacklist_introduction_and_COIBot_reports_-_small.ogg No metadata should need to be loaded here at all, not even duration, which is apparently "needed" for the image description page. Same for pages where large files are linked from - they don't need file metadata, so shouldn't try to get it. As well, if this metadata is so expensive to get we run out of memory, then it should be stored so it only needs to be done once on upload.
(In reply to comment #8) > > As well, if this metadata is so expensive to get we run out of memory, then it > should be stored so it only needs to be done once on upload. > It is stored, but it obviously can't be if the processing failed.
+mdale in case he can help :)
What about using -if available- an external program for that? That would provide a more grained memory method. And wouldn't kill the whole page.
I recommend we use ffmpeg2theora --info command. It outputs the data in JSON and does seeking to the end of the file to get the duration (so is much faster) than oggz info type command that does a linear scan of the file and outputs non-structured data that would have to be parsed. Also ffmpeg2theora is a static binary so should be easier to deploy. I will create a patch.
I created the patch to call out to ffmpeg2theora in r57933. But ffmpeg2theora does not list the offset time. So we have to "fix" the ffmpeg to ogg demuxer to know about stream offsets or use a different tool. Irregardless we should fix the php fallback solution to be less memory heavy.
jan has patched ffmpeg2tehora, freed has deployed it, I will shortly push the updated ffmpeg2theora time grabbing code to deployment.
Created attachment 6748 [details] patch to use ffmpeg2theora for metadata here is a patch for the wmf-deployment branch. I never got clarity from anyone if we can push this out or not?
Ehm, can you apply the patch? I haven't been able to upload a file on Commons for two months, now...
yea it would be good to get this applied and or review it and let me know what has to be changed.
Fixed in r60492.
We are at r61846 (https://wikitech.wikimedia.org/?diff=24985 ) but I still have the same problem described in bug 20811#c0 .
(In reply to comment #19) > We are at r61846 This is the version of /branches/wmf-deployment, not /trunk/phase3; this doesn't mean that r60492 has been deployed yet.
(In reply to comment #20) > (In reply to comment #19) > > We are at r61846 > This is the version of /branches/wmf-deployment, not /trunk/phase3; this > doesn't mean that r60492 has been deployed yet. Thank you. Sorry. Anyway, the bug for djvu seems resolved at least for some files, see 20811#c6 .