Last modified: 2012-10-30 18:38:11 UTC
Created attachment 8891 [details] Screenshot of a rendered page Let's assume: - Page 1 contains only text (no paragraphs) - Page 2 contains only text (no paragraphs) - Page 3 contains only text (no paragraphs), in the middle of text there was inserted image. When transcluding those pages using <pages> tag, MediaWiki inserts between pages <p> tag, which incorretly divides text. In the URL field there is address to sample page on pl.wikisource, which demonstrates the issue. I have also attached screenshot. I don't know what causes parser to change the way text is rendered, when there is an image in the text. I think there should not be \n inserted between pages, which is related to bug #27637 closed as INVALID. Now inserting any images makes overall quality of the document lower instead of higher.
Similar testcase (http://pl.wikisource.org/wiki/Wikiskryba:Ankry/brudnopis0): - Line 1 contains only text - Line 2 contains text and the middle of text there are images inserted - Line 3 contains only text There is no empty line between. However MediaWiki places EACH line between <p> and </p>. IMO it is in contradiction with wiki rules (where an empty line between text lines means a new patragraph).
This is a parser bug not a proofread extension bug as shown in the second example. I first thought it'll possible to workaround it in proofread extension by inserting a space instead of linefeed between page, but it breaks code where the last page end with a linefeed by protecting it from removal with an empty template. The generated code is in this case "\n<space>first line on the next page" : which mediawiki handle as a <nowiki><pre>first line</pre></nowiki>
You can use space equivalent:  
Created attachment 8946 [details] Simple patch replacing \n with  
Patch tested on my local wiki, it works with the {{nop}} template on en.ws which was broken by using a simple space instead of the proposed &#32; Beside that, can someone ping a parser maintainer as comment 2 show it's a parser bug.
The image thumb is created using <div> (block element), so it cannot be put inside <p> (inline element). The parser closes the opened paragraph, inserts <div> and then reopens paragraph.
Beau I applied your patch to ProofreadPage as a workaround however I think we're, all in agreement that this is a parser issue. I'll update the bug to reflect that and mark your patches as obsolete.
I didn't think enough about the side effect of this patch. This patch would be reverted from trunk, first it doesn't solve the problem described and actually it is thought as a noop but it is not. If you start a Page: with a LF you'll get an expected <p> from the parser, but when transclusing with the <pages command this LF will only terminate the last line of the previous page so we'll not get a <p> at the page boundary, this mean there is no clean way to get the same html by looking a Page: or after transclusing two pages, the second page starting with a LF. It's a bit odd but a LF between page transclusion is more neutral than any other character.
(In reply to comment #8) My bad, first part of comment 8 is right, this patch doesn't fix anything, but the following rationale is wrong for reverting its wrong. Before the parser is called the generated code by the extension is <span>\n{{:MediaWiki:Proofreadpage_pagenum_template|page=Page:.......}}</span>{{:Page:....djvu/97}} the span between page transclusion means a LF at start of a Page: can't be combined with the inserted LF by the extension to produce a <p>