Last modified: 2014-01-03 15:50:00 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-89. Summary: Finding unlinked articles / finding empty but linked articles Issue type: Task - A task that needs to be done. Priority: Minor Status: Done Assignee: merl <mewikipedia@to.mabomuja.de> ------------------------------------------------------------------------------- From: Läuterungsberg <Ligl@schulevonathen.de> Date: Tue, 16 Mar 2010 21:33:48 ------------------------------------------------------------------------------- a) Create a list of articles which do exist, but aren´t linked to. b) Create a list of articles which do not have any content but are linked to (via a redlink), sorted by the number of redlinks to the article. Both is meant for the english and the german wikipedia.
------------------------------------------------------------------------------- From: merl <mewikipedia@to.mabomuja.de> Date: Wed, 17 Mar 2010 15:41:59 ------------------------------------------------------------------------------- Some existing tools: a) * for dewiki http://de.wikipedia.org/wiki/Special:LonelyPages http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Verwaiste_Seiten/Arbeitsseite http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Verwaiste_Seiten/Kategorien http://toolserver.org/~merl/specialpages/dewiki/LonelyPages * for enwiki http://de.wikipedia.org/wiki/Special:LonelyPages b) http://toolserver.org/~magnus/missingtopics.php http://de.wikipedia.org/wiki/Benutzer:P.Copp/Spezialseiten/Gew%C3%BCnschte_Artikel (an old version) * * * Which output format? (simply list/CSV, wiki-syntax, ...) What do you mean with "(not) linked to"? I think you only want links included from article namespace (ignoring links from other namespaces)?
------------------------------------------------------------------------------- From: merl <mewikipedia@to.mabomuja.de> Date: Fri, 19 Mar 2010 23:31:19 ------------------------------------------------------------------------------- No response, so i'll run the query. a) All articles that are not linked, embedded or redirected. I removed the disambiguationspages. But for dewiki you should use my tool, because my bot daily add these pages to the project page and those aren't lonely anymore. b) All links from article namespace to not existing articles Here my SGE script #! /bin/bash #$ -N DBQ-89 #$ -hard #$ -l sqlprocs-s5=1 WIKI="dewiki" SQL=" CREATE TEMPORARY TABLE temp (t_pid INT PRIMARY KEY, t_pt VARCHAR(255)); INSERT INTO temp SELECT page_id, page_title FROM ${WIKI}_p.page p LEFT JOIN ${WIKI}_p.pagelinks ON p.page_namespace = pl_namespace AND p.page_title=pl_title LEFT JOIN ${WIKI}_p.templatelinks ON p.page_namespace = tl_namespace AND p.page_title=tl_title LEFT JOIN ${WIKI}_p.redirect ON p.page_namespace = rd_namespace AND p.page_title=rd_title WHERE p.page_namespace=0 AND page_is_redirect=0 AND pl_namespace IS NULL AND tl_namespace IS NULL AND rd_namespace IS NULL; CREATE TEMPORARY TABLE tempdab(dab_tltitle INT PRIMARY KEY); INSERT INTO tempdab SELECT pl_title FROM ${WIKI}_p.page INNER JOIN ${WIKI}_p.pagelinks ON page_id=pl_from WHERE page_namespace=8 AND page_title='Disambiguationspage' AND pl_namespace=10; CREATE TEMPORARY TABLE tempdel(td_pid INT PRIMARY KEY); INSERT IGNORE INTO tempdel SELECT t_pid FROM temp INNER JOIN ${WIKI}_p.templatelinks ON t_pid=tl_from INNER JOIN tempdab ON tl_title = dab_tltitle WHERE tl_namespace=10; DELETE a FROM temp a INNER JOIN tempdel b ON t_pid=td_pid; SELECT REPLACE(t_pt,'_',' ') FROM temp ORDER BY t_pt;" mysql -wBN -h$WIKI-p.db u_$USER -e "$SQL" > ${HOME}/public_html/sql/DBQ-89-${WIKI}-LonelyPages.txt SQL=" CREATE TEMPORARY TABLE temp (pid INT, pt VARCHAR(255), plt VARCHAR(255), INDEX(pt), INDEX(plt)); INSERT INTO temp SELECT page_id, page_title, pl_title FROM ${WIKI}_p.page INNER JOIN ${WIKI}_p.pagelinks ON pl_from = page_id WHERE page_namespace = 0 AND page_is_redirect = 0 AND pl_namespace = 0; SELECT CONCAT(REPLACE(plt,'_',' '),',',COUNT(*)) FROM temp LEFT JOIN ${WIKI}_p.page ON page_namespace = 0 AND plt = page_title WHERE page_namespace IS NULL GROUP BY plt HAVING COUNT(*) > 300 ORDER BY COUNT(*) DESC LIMIT 5000;" mysql -wBN -h$WIKI-p.db u_$USER -e "$SQL" > ${HOME}/public_html/sql/DBQ-89-${WIKI}-Redlinks.txt
------------------------------------------------------------------------------- From: merl <mewikipedia@to.mabomuja.de> Date: Tue, 30 Mar 2010 22:35:06 ------------------------------------------------------------------------------- * Redlinks * http://toolserver.org/~merl/sql/DBQ-89-enwiki-Redlinks.txt * http://toolserver.org/~merl/sql/DBQ-89-dewiki-Redlinks.txt * Because bots are linking loneypages you should use the project pages. Query was not successful * http://en.wikipedia.org/wiki/Category:All_orphaned_articles * http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Verwaiste_Seiten/Kategorien * http://toolserver.org/~merl/specialpages/dewiki/LonelyPages
This bug was imported as RESOLVED. The original assignee has therefore not been set, and the original reporters/responders have not been added as CC, to prevent bugspam. If you re-open this bug, please consider adding these people to the CC list: Original assignee: bugreporter@to.mabomuja.de CC list: bugreporter@to.mabomuja.de