Last modified: 2014-10-23 14:54:05 UTC
[...] body: '<?xml version="1.0" encoding="UTF-8"?><methodResponse><params><param><value><struct><member><name>bugs</name><value><struct><member><name>9444</name><value><struct><member><name>comments</name><value><array><data><value><struct><member><name>is_private</name><value><boolean>0</boolean></value></member><member><name>count</name><value><int>0</int></value></member><member><name>creator</name><value><string>papadako@csd.uoc.gr</string></value></member><member><name>time</name><value><dateTime.iso8601>20070329T08:11:13</dateTime.iso8601></value></member><member><name>bug_id</name><value><int>9444</int></value></member><member><name>author</name><value><string>papadako@csd.uoc.gr</string></value></member><member><name>text</name><value><string>A database error has occurred Query: SELECT\nmath_outputhash,math_html_conservativeness,math_html,math_mathml FROM math WHERE\nmath_inputhash = \'\xef\xbf\xbd\xef\xbf\xbd\xd7\xbe\xef\xbf\xbd\x1f\x11\xef\xbf\xbd\xef\xbf\xbd\x12@\x01\xcb\xb5\' LIMIT 1 Function: MathRenderer::_recall Error: 1\nERROR: invalid byte sequence for encoding "UTF8": 0xebc3d' Traceback (most recent call last): File "minimal.py", line 64, in <module> fetch(i) File "minimal.py", line 49, in fetch com = server.Bug.comments(kwargs)['bugs'][bugid]['comments'] File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__ return self.__send(self.__name, args) File "/usr/lib/python2.7/xmlrpclib.py", line 1578, in __request verbose=self.__verbose File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib/python2.7/xmlrpclib.py", line 1297, in single_request return self.parse_response(response) File "/usr/lib/python2.7/xmlrpclib.py", line 1467, in parse_response p.feed(data) File "/usr/lib/python2.7/xmlrpclib.py", line 557, in feed self._parser.Parse(data, 0) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 22
Upstreamed as https://bugzilla.mozilla.org/show_bug.cgi?id=1055629
Should drop some stupid chars like via $string =~ tr/\xea-\xef/-/; somewhere before text => $self->type('string', $comment->body_full), in http://bzr.mozilla.org/bugzilla/4.4/view/head:/Bugzilla/WebService/Bug.pm#L296 I guess. Late uneducated comment that might be blatantly wrong tomorrow morning.
[Mostly making comments here for myself.] One problem here is that we have not 200% identified which actual chars are offending, we only guess. Another problem is that I cannot easily create a local testcase. Workaround in https://bugzilla.mozilla.org/show_bug.cgi?id=839023#c10 : Use $initial =~ s/([\x01-\x08\x0b\x0c\x0f-\x1f])/sprintf "\\x%02x", ord($1)/ge; http://perldoc.perl.org/perlebcdic.html#Quoted-Printable-encoding-and-decoding lists a similar example (also >x80 for stripping non-ascii entirely): $qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge; Above workaround is overkill though: if you replaced \x61 (letter: a) you'd end up with "Wrong/unsupported datatype 'boole\\x61n' specified" in the XMLRPC response. Hence slightly concerned about unwanted side effects, but above character range is nothing that should be used anyway. So I tested the two-liner hack with the less commonly used letter \xc4\x8d (letter: č) in some comments, and the char replacement worked as expected in the XMLRPC response. Helpful tables for conversion: http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=string-literal
Change 155732 had a related patch set uploaded by Aklapper: When exporting Bugzilla tickets via Chase's script we run into an API bug with specific Unicode letters for https://bugzilla.wikimedia.org/show_bug.cgi?id=9444#c0. This is applying a hackish upstream workaround described in https://bugzilla.mozilla.org/sh https://gerrit.wikimedia.org/r/155732
Change 156100 had a related patch set uploaded by Aklapper: Work around Bugzilla XML RPC bug with special Unicode characters https://gerrit.wikimedia.org/r/156100
Change 155732 merged by Dzahn: Create copy of upstream file (for followup custom change) https://gerrit.wikimedia.org/r/155732
Change 156100 merged by Dzahn: Work around Bugzilla XML RPC bug with special Unicode characters https://gerrit.wikimedia.org/r/156100
Now a script querying the XML RPC API does not drop out anymore at ticket #9444, the XML also looks still valid, and I have not experienced any other explosions or incidents yet. Closing as FIXED, crossing fingers it'll stay like that.
Note: As this workaround is applied to *any* output if also damages binary attachment data. See https://phabricator.wikimedia.org/T815