Last modified: 2014-11-21 00:23:18 UTC
Values in MIME headers which are encoded using RFC2231 are not accepted. As the only header value which isn't ASCII (and thus might need to be encoded) is in ApiUpload this currently only accepts uploads with non-ASCII chars in the filename. Another interpretation of this bug is also, why there is 'filename' in the header of the chunk/file entry of the MIME request as the server ignores it apparently. But that would just mask the underlying issue that there is no way to get Unicode data in the header values to the server (except by just using the encoding of the server but afaics is that not MIME compliant and Python 3's library doesn't support that).
I note that one of the header examples you gave in bug 73661, Content-disposition: =?utf-8?b?Zm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9?= =?utf-8?b?IsOcMi5qcGci?= is not actually valid. See RFC 2047 section 5. However, the other one, Content-disposition: form-data; name="file"; filename*=utf-8''%C3%9C.jpg is also not correctly recognized. But that doesn't have anything to do with MediaWiki, as PHP itself is not correctly handling such encoded parameters when populating $_POST and $_FILES. If this gets fixed in PHP, MediaWiki should accept it fine.
https://www.mediawiki.org/wiki/API:Upload(In reply to Brad Jorsch from comment #1) > I note that one of the header examples you gave in bug 73661, > > Content-disposition: > =?utf-8?b?Zm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9?= > =?utf-8?b?IsOcMi5qcGci?= > > is not actually valid. See RFC 2047 section 5. yea, and it is noted as garbage in that bug. We should find the Python 2 bug for that. > However, the other one, > > Content-disposition: form-data; name="file"; filename*=utf-8''%C3%9C.jpg > > is also not correctly recognized. > > But that doesn't have anything to do with MediaWiki, as PHP itself is not > correctly handling such encoded parameters when populating $_POST and > $_FILES. If this gets fixed in PHP, MediaWiki should accept it fine. Shouldnt this be filed as a bug, and this tracked as an 'upstream' bug? I couldnt find a php bug about this, but it is very possible I have missed it because I'm not familiar with terms php uses. https://www.mediawiki.org/wiki/API:Upload only says the following about these fields file - File contents chunk - Chunk contents paraminfo says type 'upload'; that is all. https://en.wikipedia.org/w/api.php?action=paraminfo&modules=upload API:Upload suggests it should look like Content-Disposition: form-data; name="file"; filename="Apple.gif" But that doesnt address non us-ascii filenames. It looks like we can send any value as the filename in Content-disposition. The following is copying my rough analysis on https://gerrit.wikimedia.org/r/#/c/174677/ (would appreciate any corrections or historical titbits from mediawiki devs): fwiw, this filename value is exposed to MediaWiki extensions via WebRequestUpload method getName. http://git.wikimedia.org/blob/mediawiki%2Fcore.git/c1826209e739d51359bcea37ff4116eed9bd971c/includes%2FWebRequest.php#L1173 ($fileInfo comes from $_FILES which is http://php.net/manual/en/reserved.variables.files.php) Interestingly, Safari sends unicode filename to the server using html encoding (probably {), which are decoded by Sanitizer.php : http://git.wikimedia.org/blob/mediawiki%2Fcore.git/c1826209e739d51359bcea37ff4116eed9bd971c/includes%2FSanitizer.php#L32 WebRequestUpload method getName does not appear to be used in the current mediawiki codebase, but it is used (badly) by some (probably broken) mediawiki extensions. I quickly checked the v1.16 codebase, and cant see any use of getName to be concerned about.