Last modified: 2014-03-17 16:28:09 UTC
This would greatly help reuse of the data. It is however not a trivial operation, as for most tables underlying data are on much lower aggregation level (and sometimes multiple Mb in size). So in many cases these higher aggregated csv files will have to created anew. This could be done in two ways: A pre-processing step which builds those csv files separately, followed by the report generation phase which converts data to html, times 25 for so many languages. The alternative is to weave extra lines into the existing code, and write data to html and csv files in close succession. However the code is already pretty complicated and would become even harder to maintain. A pre-processing step would ease debugging, speed up report generation, although a few hours gained to generate 100k html files (800 wikis, many reports in 25 languages) is still nothing compared to data collection phase. But a pre-processing stage would require major maintenance. As new code is needed, but on top of top that existing code needs to be rewritten. More work than adding extra lines between existing code.
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1428
I would like to avoid serious feature work on Wikistats right now. Erik is pretty busy with additional work on page views.
Over the years many people asked for raw data behind Wikistats tables but this would be a pretty daunting update.
We're going to investigate using a jquery plugin to convert data from some tables to CSVs. It won't work for everything but might be a useful workaround.