Last modified: 2014-11-20 16:41:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T75661, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 73661 - Uploads don't allow non-ASCII characters in filename
Uploads don't allow non-ASCII characters in filename
Status: PATCH_TO_REVIEW
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
core-(2.0)
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-20 16:23 UTC by Fabian
Modified: 2014-11-20 16:41 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Fabian 2014-11-20 16:23:19 UTC
Depending on the used version either the original file may not contain non-ASCII characters or the target page name on the wiki. This was changed in Ib751ee3f4074a60f3b53b0afe3cc2dfc3e17b2f7 in pwb 2.0 so versions prior to that won't work with non-ASCII local filenames and versions with that won't work with non-ASCII wiki page names.

The problem is simply that the 'filename'-value in the header of the file/chunk entry (not to be confused with the 'filename' entry in the MIME request). For example:

  Content-Type: image/jpeg
  MIME-Version: 1.0
  Content-disposition: form-data; name="file"; filename*=utf-8''%C3%9C.jpg
  Content-Transfer-Encoding: binary

  [… binary data …]

This would be the RFC2231 compliant encoding of a non-ASCII character, which would be used by default in Python 3. Python 2 instead does a strange encoding of the complete line (this may not represent the same text as above but similar):

  Content-disposition: =?utf-8?b?Zm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9?=   
   =?utf-8?b?IsOcMi5qcGci?=

Both are not accepted by the MediaWiki server and are answered with:

  badupload_file: File upload param file is not a file upload; be sure to use multipart/form-data for your POST and include a filename in the Content-Disposition header.

Or Python 2:

  missingparam: One of the parameters filekey, file, url, statuskey is required

It is possible to leave it UTF8 encoded although that is (afaics) not compliant with the RFCs related to MIME which say that the header may only contain US-ASCII characters.

Unfortunately I'm not sure what mediawiki does with this so I don't if there is a better way, especially as Python 3 doesn't support 'bytes' in the header and otherwise it's not possible to get the value not reencoded there.
Comment 1 Gerrit Notification Bot 2014-11-20 16:25:47 UTC
Change 174677 had a related patch set uploaded by XZise:
[FIX] Upload: Support Unicode filenames

https://gerrit.wikimedia.org/r/174677

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links