Last modified: 2014-01-03 16:11:45 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-191. Summary: Number of articles with files on Wikiquote Issue type: Task - A task that needs to be done. Priority: Major Status: Done Assignee: Hoo man <hoo@online.de> ------------------------------------------------------------------------------- From: Federico Leva <federicoleva@tiscali.it> Date: Mon, 16 Jul 2012 13:43:49 ------------------------------------------------------------------------------- I'd like to know how many articles have at least an embedded image on each Wikiquote, or at least it and en.quote. WikiStats gives only the total number of image links. Thanks.
------------------------------------------------------------------------------- From: Liangent <liangent@gmail.com> Date: Sat, 28 Jul 2012 22:06:56 ------------------------------------------------------------------------------- How is "article" defined?
------------------------------------------------------------------------------- From: Federico Leva <federicoleva@tiscali.it> Date: Sat, 28 Jul 2012 22:12:39 ------------------------------------------------------------------------------- Pages in namespace 0. There's no need to worry about redirects and definitions because pages with an image are "good articles" according to every definition, and I'm not asking the total count of articles. Thank you!
------------------------------------------------------------------------------- From: Liangent <liangent@gmail.com> Date: Sat, 28 Jul 2012 22:58:59 ------------------------------------------------------------------------------- http://toolserver.org/~liangent/-/dbq/191 Main pages are often counted as articles.
------------------------------------------------------------------------------- From: Liangent <liangent@gmail.com> Date: Sat, 28 Jul 2012 22:59:52 ------------------------------------------------------------------------------- $ for wiki in `mysql -BN -h sql -e "SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND family = 'wikiquote';"` > do > echo -n $wiki ' ' | tee -a 191 > mysql -BcN -h ${wiki/_/-}.rrdb $wiki -e 'SELECT COUNT(DISTINCT page_id) FROM page JOIN imagelinks ON il_from = page_id WHERE page_namespace = 0' | tee -a 191 > done
------------------------------------------------------------------------------- From: Federico Leva <federicoleva@tiscali.it> Date: Sun, 29 Jul 2012 11:10:26 ------------------------------------------------------------------------------- Thank you very much. However, there's an important thing we've forgotten, that is that only images embedded directly on the page must be considered, not images embedded by templates (icons and such): can those be filtered? An idea would be to exclude images which have imagelinks from the template page itself orr, if this is not reliable enough, all images which have more than e.g. 5 imagelinks in the wiki. (The reopen button is not working in Firefox for me.)
------------------------------------------------------------------------------- From: Hoo man <hoo@online.de> Date: Tue, 02 Oct 2012 00:05:01 ------------------------------------------------------------------------------- I've did it as suggested now, so that only images that are linked less than 5 times are taken into account. Script: #!/bin/bash wikilist=$(mysql -wBN -h sql -e "SELECT dbname FROM toolserver.wiki WHERE domain IS NOT NULL AND family = 'wikiquote';"); SQL="SELECT DATABASE(), COUNT(DISTINCT page_id) FROM page JOIN (SELECT il_from FROM imagelinks GROUP BY il_from HAVING COUNT(*) < 5) AS img_links ON il_from = page_id WHERE page_namespace = 0"; echo -e "wiki\tarticle_count" > dbq-191.txt; for runwiki in $wikilist do echo "querying $runwiki" sql -r $runwiki "$SQL" | tail -n +2 >> dbq-191.txt done Result: http://toolserver.org/~hoo/dbq/dbq-191.txt
This bug was imported as RESOLVED. The original assignee has therefore not been set, and the original reporters/responders have not been added as CC, to prevent bugspam. If you re-open this bug, please consider adding these people to the CC list: Original assignee: hoo@online.de CC list: federicoleva@tiscali.it, hoo@online.de, liangent@gmail.com