Last modified: 2014-11-20 21:01:16 UTC
Extend MediaWiki API Query module to support basic Wikidata data retrieval locally. This would allow Wikidata data to be included as part of other API queries and even use it with generators (https://www.mediawiki.org/wiki/API:Query#Generators). Minimum requirement would be to retrieve wikidata descriptions using page titles or ids. (This would facilitate their use in search suggestions.) Other possible capabilities would include retrieving the Wikidata labels, aliases, claims, and inter-language links.
So...basically implement https://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API ?
Legoktm: Basically, yes.
Yuri's RFC is for use on the Repo, though. The idea there is to use Wikibase stuff as generators. Ryan's request, if I understand it correctly, is to implement a property module that can be used to provide extra properties for pages listed by a generator on a client wiki.
If I understand correctly, the intended use case is this: you have a list of local pages titles (e.g. from a prefix search), and want to list the; in the listing, you want to show some extra info from Wikidata, like the description. The suggestion is to allow API queries to include this extra information using an API prop module. This could be done, but I wonder whether it's worth the effort. You can get the same info easily from Wikidata directly, with a single API call. For example, to get the wikidata labels and descriptions, in English, associated with the Pages Birch, Beech, and Beetle on enwiki, you can use the following query: http://www.wikidata.org/w/api.php?action=wbgetentities&format=json&sites=enwiki&titles=Birch%7CBeech%7CBeetle&props=labels%7Cdescriptions&languages=en%7Cen-ca%7Cen-gb Isn't this sufficient?
Yes, that's basically what folks are currently doing, but it isn't ideal. Ideally, we would like to be able to get regular page props and wikidata data from a single API call. Also, we would like to avoid the extra DNS lookup of an external HTTP request in high-traffic contexts (like search suggestions) if possible.
I second what Kaldari has said. Sure, it's sufficient, but it shouldn't be necessary. :-)
Considering that with my approach, you would be hitting wbgetentities with a couple of hundreds of queries from the mobile search interface, I suppose you are right: that isn't going to work. wbgetentities needs to load the full entity structure from the blob store, that's slow... We already have the data you wan in the wb_terms table. I suppose adding a client side module that works much like the ApiQueryPageProps would be easy enough, and should make this a lot faster. I can't promise that it will be performant enough though, I hear the API servers are pretty loaded. An alternative solution would be to add this information directly to Elastic, so it can be returned directly by the search module. By the way, what do you use to generate the original list of local page titles? action=opensearch? action=wbsearchentities?
I have implemented a pageterms module, see I9b6b52f6b75e4d6a
Daniel, the apps currently use both prefixsearch and search generators. I can't speak for mobile web, but I guess it's similar. When the user clicks search we perform a title search first, then allow the user to switch to full text search from there. We currently have to collect the wikibase_items and then send off another request to wikidata.org to get the descriptions. Like Kaldari mentioned above, we would like to avoid that. Below are some examples we have currently implemented. (1) Title search: https://en.m.wikipedia.org/w/api.php?action=query&format=json&generator=prefixsearch&gpssearch=foo&gpsnamespace=0&gpslimit=12&prop=pageprops%7Cpageimages&ppprop=wikibase_item&piprop=thumbnail&pithumbsize=96&pilimit=12&list=prefixsearch&pssearch=formula&pslimit=12 (2) Full text search: https://en.m.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops%7Cpageimages&ppprop=wikibase_item&generator=search&gsrsearch=foo&gsrnamespace=0&gsrwhat=text&gsrinfo=&gsrprop=redirecttitle&gsroffset=0&gsrlimit=12&list=search&srsearch=foo&srnamespace=0&srwhat=text&srinfo=suggestion&srprop=&sroffset=0&srlimit=12&piprop=thumbnail&pithumbsize=96&pilimit=12
As Kaldari noted, "PageTerms" is not self-explanatory. My first thought was it would contain per-page legal terms (e.g. license).