Last modified: 2014-11-04 12:41:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67464, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 65464 - bits.wikimedia.org discourages indexing, resulting in goofy archive.org snapshots


Summary:	bits.wikimedia.org discourages indexing, resulting in goofy archive.org snaps...

Status:	RESOLVED FIXED

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	wmf-deployment
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	shell

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-05-18 20:36 UTC by MZMcBride
Modified:	2014-11-04 12:41 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description MZMcBride 2014-05-18 20:36:23 UTC

bits.wikimedia.org was set up to serve JavaScript and CSS in January 2010 (cf. [[wikitech:Server admin log/Archive 15#January 10]].

Its [[robots.txt]] file disallows everything:
 
$ curl bits.wikimedia.org/robots.txt
User-agent: *
Disallow: /

Comparing the French Wikipedia main page from 2009 (<https://web.archive.org/web/20090601000000*/http://fr.wikipedia.org/wiki/Accueil>) to 2010 (<https://web.archive.org/web/20110101000000*/http://fr.wikipedia.org/wiki/Accueil>) shows its effect: the generated snapshots look goofy.

Perhaps we should specifically re-enable the IA bot?

This issue was reported by ytrezq in #wikimedia-operations in freenode.

Comment 1 Nemo 2014-07-06 15:13:33 UTC

This should certainly be done, cf. 0e30a230d8eb105ff3724d4aade4be57eadba2a1 ; but I don't understand if the file is in operations/puppet, where I only find files/misc/robots-txt-disallow (used for gerrit) and modules/mediawiki_singlenode/files/robots.txt (used in labs).

IMHO the robots.txt "catchall" disallow rules should all be managed by files/misc/robots-txt-disallow and ia_archiver allowed there.

Comment 2 Marius Hoch 2014-07-06 15:27:56 UTC

The file is robots-private.txt in operations/mediawiki-config

Comment 3 Gerrit Notification Bot 2014-07-06 16:14:58 UTC

Change 144364 had a related patch set uploaded by Nemo bis:
Allow Internet Archive's Wayback machine to get stuff from bits etc.

https://gerrit.wikimedia.org/r/144364

Comment 4 Gerrit Notification Bot 2014-07-08 18:25:00 UTC

Change 144364 merged by jenkins-bot:
Allow Internet Archive's Wayback machine to get stuff from bits etc.

https://gerrit.wikimedia.org/r/144364

Comment 5 Nemo 2014-07-08 18:32:26 UTC

https://bits.wikimedia.org/robots.txt looks good but the effects can be verified only after it gets recrawled; currently e.g. http://web.archive.org/save/http://bits.wikimedia.org/geoiplookup complains.

Comment 6 Nemo 2014-07-11 22:55:27 UTC

It does look better.
http://web.archive.org/web/20140711225404/http://it.wikipedia.org/wiki/Pagina_principale

Comment 7 Nemo 2014-10-02 09:27:53 UTC

(In reply to MZMcBride from comment #0)
> (<https://web.archive.org/web/20110101000000*/http://fr.wikipedia.org/wiki/
> Accueil>) shows its effect: the generated snapshots look goofy.

No longer?

(In reply to Nemo from comment #6)
> It does look better.
> http://web.archive.org/web/20140711225404/http://it.wikipedia.org/wiki/
> Pagina_principale

No longer. :(
https://bits.wikimedia.org/robots.txt is a 404 now, what happened? Is that the cause?

Comment 8 Glaisher 2014-10-03 16:35:57 UTC

(In reply to Nemo from comment #7)
> https://bits.wikimedia.org/robots.txt is a 404 now, what happened? Is that
> the cause?

I85dcec2a168b9b2d42678d1d9a0e314793c99e21 perhaps?

Comment 9 Nemo 2014-11-04 12:41:57 UTC

And now it looks (mostly) good again. Go figure.

http://web.archive.org/web/20141104122645/http://it.wikipedia.org/wiki/Pagina_principale

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links