Last modified: 2013-10-06 10:29:05 UTC
Currently, zimwriter simply makes a list of mime types and assigns the mime codes to each. It would be better if the mime codes are sorted before they are assigned the codes, and written to the file. This would be ensure uniformity. For example, in zimpatch, the output file( obtained from the patch file and start_file) has different library mime types, since the articles were fed to zimwriter in a different order.
Issue has been resolved. However, since this will be the new standard, it will be required to regenerate older zim files in order to obtain checksum match during zimdiff/zimpatch. https://gerrit.wikimedia.org/r/#/c/79021/
(In reply to comment #1) > Issue has been resolved. If the issue has really been resolved (how?) and a potential fix has been *merged* into the codebase, feel free to set RESOLVED status here. :)
(In reply to comment #2) > (In reply to comment #1) > > Issue has been resolved. > > If the issue has really been resolved (how?) The issue has been resolved. Well, at least it worked on my system. If you found any errors, please comment. The objective was to ensure that zimlib sorts the MIME types before writing them to the file. According to the existing zimlib code, the articles obtained using getNextArticle() function are collected and stored in dirents, in the order in which they are obtained. The problem was that the list of MIME Types is created in the order in which the articles are sent to zimlib, and the LibraryMimeType code is added to each dirent immediately. The patch I wrote creates a new sorted list of MIME Types just before writing to file, and creates a mapping between the old and new LibraryMimeType. All the dirents are visited again and their LIbrarYMimeTypes are updated using the mapping. > and a potential fix has been > *merged* into the codebase, Well, it hasn't been merged into the codebase. I sent it for review, but it hasn't been approved yet. Tommi manages the commits, and he is on vacation. > feel free to set RESOLVED status here. :) I will, as soon as it is approved.
I see, different interpretations of words. In Bugzilla, "resolved" does not mean "I wrote a patch and it works for me", but "patch has been reviewed and merged". Hence my confusion.
#Tommi May you please check that everything is OK with Kiran's fix before closing it?
Code to review is here: https://gerrit.wikimedia.org/r/#/c/79021/
This code was reviewed and merged. But IMO the solution is buggy because new generated files have a wrong checksum. I can not explain why, but it seems to me that these files are bigger than they should (because header checksum position is not equal anymore to filesize-16). In addition, I ask myself if the method of sorting the mimetypes is the good one. This forces to loop through all dirents during the file creation (to change the mimetype id), something which is not very elegant. Maybe a better approach would be to allow to force a certain mimetype header at the beginning of the file creation process. In that case, the list of mime-types would not be created dynamically during the article insertion and we would have fixed the problem we have with zimdiff/zimpatch. In any case, I think we should rollback or fix this, because the zimlib is currently somehow "broken".
The problem seems to be more severe- The test program createzim-t in test folder in zimwriter now gives a segmentation fault during write process.
It is not that severe then any more. If it is easily reproducible the bug is much easier to find. The whole problem was, that the remapping tried to fetch a mapped mime type for directory entries, which do not have mime types. There are special mime types "redirectMimeType", "linktargetMimeType" and "deletedMimeType", which have the values 0xfffd-0xffff. They must not be used when accessing the mapping vector.
Nice! I confirm the problem is fixed. I have open a new feature request to try to improve/fix this perfectible approach regarding the mime-types: https://bugzilla.wikimedia.org/show_bug.cgi?id=55363