Last modified: 2014-11-17 09:57:14 UTC
According to Roan in bug 30287 comment 9, actually enabling the uca-default collation stuff that was "fixed" for bug 164 is waiting on an Ubuntu upgrade on the apache cluster (bug 29915?). There are a few bugs which it looks like should be resolved (for Categories at least) by enabling this -- eg bug 30287 (Farsi sorting problems); others require further work (bug 29788 needs a Swedish-specific collation setting).
Closing LATER until apaches are all upgraded
Relevant dependencies as RT tickets: http://rt.wikimedia.org/Ticket/Display.html?id=22 full update to Lucid (bug 29915) http://rt.wikimedia.org/Ticket/Display.html?id=652 install icu & php5-intl (depends on the above)
RT #22 and 652 are done. this can probably be closed.
(In reply to comment #3) > RT #22 and 652 are done. this can probably be closed. Well this still needs someone to make the changes to MediaWiki's config file and run the maintenance script.
The first letter identification code (maintenance/language/generateCollationData.php) won't work for all languages, so some wikis will have their category pages broken terribly by this change. Also, the default collation tables sort a lot of languages incorrectly, and the amount of breakage that causes will depend on the language in question. So I recommend doing this change on a language-by-language basis, after checking each language for correct collation and first-letter behaviour on a test wiki. Also, it would be nice to know in advance what percentage of sort keys will be larger than the 230 bytes allowed by the database field, and if that percentage is significant, whether there are categories on the target wikis where the order will be changed by truncation after 230 bytes.
Any progress on this? On Portuguese Wikipedia we still need to use {{DEFAULTSORT: Page Name without accents }} on any article whose title has an accent if we want it to be sorted appropriately in the categories. E.g.: https://pt.wikipedia.org/w/index.php?title=%C3%81gua_Boa&oldid=28441112&action=edit Maybe adding a note to [[mw:Roadmap]] would be appropriated?
Some related info: I created some collations for Chinese and is expected to be used on zhwiki. This code requires ICU 4.8+ to run. Current php5-intl in WMF's APT repo uses libicu42 and existing wikis with uca-default (ptwiki) have sort keys generated with libicu42. Once libicu is updated all existing uca-default sort keys need to be rebuilt.
Btw meta, and especially commons may be good next targets for deploying uca-default to. Both are multilingual so using the root coallation seems ideal
'wgCategoryCollation' => array( 'default' => 'uppercase', 'ptwiki' => 'uca-default', # bug 35632 'iswiktionary' => 'identity', # bug 30722 ), I'm presuming this is fixed now...
(In reply to comment #9) > 'wgCategoryCollation' => array( > 'default' => 'uppercase', > 'ptwiki' => 'uca-default', # bug 35632 > 'iswiktionary' => 'identity', # bug 30722 > ), > > > I'm presuming this is fixed now... Umm only for ptwiki.
Just to clarify this bug-we probably should *not* do this for all wikis. As tim said above, more mw code is needed to make it work properly. However this can (and should imo) be done on all english, portugese, and multilingual (meta and commons) wikis
I guess, a rough list for this would be: reedy@fenari:/home/wikipedia/common$ grep enw all.dblist arbcom_enwiki enwiki enwikibooks enwikinews enwikiquote enwikisource enwikiversity enwikivoyage enwiktionary tenwiki wg_enwiki reedy@fenari:/home/wikipedia/common$ grep ptw all.dblist ptwiki ptwikibooks ptwikinews ptwikiquote ptwikisource ptwikiversity ptwikivoyage ptwiktionary +brwikimedia reedy@fenari:/home/wikipedia/common$ cat special.dblist advisorywiki arbcom_dewiki arbcom_enwiki arbcom_fiwiki arbcom_nlwiki auditcomwiki boardgovcomwiki boardwiki chairwiki chapcomwiki checkuserwiki collabwiki commonswiki donatewiki execwiki fdcwiki foundationwiki grantswiki incubatorwiki internalwiki mediawikiwiki metawiki movementroleswiki nostalgiawiki officewiki otrs_wikiwiki outreachwiki qualitywiki searchcomwiki sourceswiki spcomwiki specieswiki stewardwiki strategywiki tenwiki test2wiki testwiki usabilitywiki wg_enwiki wikimania2005wiki wikimania2006wiki wikimania2007wiki wikimania2008wiki wikimania2009wiki wikimania2010wiki wikimania2011wiki wikimania2012wiki wikimania2013wiki wikimaniateamwiki wikidatawiki
Do the rest of the is projects want to become identity too? reedy@fenari:/home/wikipedia/common$ grep isw all.dblist iswiki iswikibooks iswikiquote iswikisource iswiktionary
(In reply to comment #13) > Do the rest of the is projects want to become identity too? > > reedy@fenari:/home/wikipedia/common$ grep isw all.dblist > iswiki > iswikibooks > iswikiquote > iswikisource > iswiktionary I would imagine so. The language is case sensitive from what I understand. I guess we should ask. ----- Realistically it doesnt matter that much for a wiki like wikimania2006 since nobody is using them. Although it certainly wouldn't hurt anything. For larger wikis (where it would take more than a couple hours to run the script) we would probably want to talk to the local community as categories will behave somewhat weirdly when the script is running. ( pages will be out of order while the script is running) its too bad the script doesnt go in order of cl_to instead of cl_from as that would minimize disruption somewhat.
Adjusting the summary: "Set $wgCategoryCollation to 'uca-default' and rebuild category sort keys on Wikimedia wikis deployment" -> "Change $wgCategoryCollation values to appropriate one for each Wikimedia wiki". Per bug 45443, we don't really want uca-default anywhere anymore (apart from multi-language projects like Commons or Meta), but language-specific collations.