Last modified: 2014-04-30 18:27:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64733, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 62733 - CirrusSearch: Where did all the JS pages go?


Summary:	CirrusSearch: Where did all the JS pages go?

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	CirrusSearch (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:	Elasticsearch_Open_Bug
Keywords:	upstream

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-03-17 14:02 UTC by Nik Everett
Modified:	2014-04-30 18:27 UTC (History)
CC List:	4 users (show)

See Also:	61752 https://github.com/elasticsearch/elasticsearch/issues/5648
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nik Everett 2014-03-17 14:02:44 UTC

Lucenesearch: https://de.wikipedia.org/w/index.php?title=Spezial:Suche&search=moveInterwikisToTop&fulltext=Suche&profile=all&redirs=1

Cirrus: https://de.wikipedia.org/w/index.php?title=Spezial:Suche&search=moveInterwikisToTop&fulltext=Suche&profile=all&redirs=1&srbackend=CirrusSearch

Where did all the JS pages go?

Comment 1 Chad H. 2014-03-21 17:20:04 UTC

Wonder if something went wrong with Gerrit change #115214.

Comment 2 Nik Everett 2014-04-30 18:09:15 UTC

I believe this is caused by us not word breaking foo.bar into foo and bar.  The solution to this, as I see it, is to use the word_break token filter _but_ to do that I have to rebuild each analyzer with that filter.  That isn't easy because now what I want the German analyzer I can ask for 
{"analyzer":{"text":{"type":"german"}}}
but to rebuild it I have to do this:
{"analyzer":{"text":{
            "filter": [
                "standard",
                "lowercase",
                "german_stop",
                "german_normalization",
                "light_german_stemmer"
            ],
            "tokenizer": "standard",
            "type": "custom"
}},"filter":{
        "german_stop": {
            "stopwords": [
                "denn",
...
                "eures",
                "dies",
                "bist",
                "kein"
            ],
            "type": "stop"
        }
}}

Except even that doesn't work because german_normalization isn't properly exposed!  The pull request I've opened upstream exposes all the stuff I'd need and it creates an endpoint on Elasticsearch designed to spit this back out for easy customization.

Comment 3 Chad H. 2014-04-30 18:15:57 UTC

Interesting. Wonder if we're running into bug 40612 in a different form then.

Comment 4 Nik Everett 2014-04-30 18:17:08 UTC

I have little doubt.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links