Last modified: 2014-11-12 18:31:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72153, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 70153 - MessageGroupStats co-operative deadlock with transactions and GET_LOCK


Summary:	MessageGroupStats co-operative deadlock with transactions and GET_LOCK

Status:	RESOLVED FIXED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	Translate (Other open bugs)
Version:	master
Hardware:	All All

Importance:	High major (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-08-29 06:01 UTC by Sean Pringle
Modified:	2014-11-12 18:31 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Sean Pringle 2014-08-29 06:01:16 UTC

This is related to bug 51410, but looks like a new form of the old problem introduced by that fix.

On mediawikiwiki master queries experience lock-wait-timeout in what looks like an effective deadlock between transactions and co-operative locks.

SELECT /* MessageGroupStats::forItemInternal */ GET_LOCK('MessageGroupStats:modify:page-MediaWiki-Vagrant', 1) AS lockstatus;

UPDATE /* LinksUpdate::updateLinksTimestamp */  `page` SET page_links_updated = '20140829051059' WHERE page_id = '226112';

The queries are unrelated. The LinksUpdate query is perfectly ok until MessageGroupStats appears.

From the database end, it looks like MessageGroupStats get_lock() is called in a loop by a connection which can already have an open transaction with row locks on the page and translate_groupstats tables.

When the co-op lock is not acquired quickly, MessageGroupStats transactions bottleneck and queue up, collectively holding many row locks and blocking other queries like LinksUpdate *and whichever MessageGroupStats connection already holds the co-op lock*.

We should not be combining transactions and co-operative locking in this manner.

Comment 1 Aaron Schulz 2014-08-29 06:07:54 UTC

Some of this code runs in autocommit mode by runners but other times it still happens in a big transaction. I've mentioned that all of this needs to be move to the job queue.

A quick work around would be to not use GET_LOCK for web requests or to use some MW transaction hook to push this all post-COMMIT for web requests.

Comment 2 Gerrit Notification Bot 2014-08-29 06:24:31 UTC

Change 157040 had a related patch set uploaded by Aaron Schulz:
Avoid GET_LOCK in non-autocommit mode

https://gerrit.wikimedia.org/r/157040

Comment 3 Nemo 2014-08-29 06:58:26 UTC

A bit off topic, sorry...

(In reply to Aaron Schulz from comment #1)
> I've mentioned that all of this needs to
> be move to the job queue.

Using the job queue however can be disruptive sometimes when it gets in the way of editing, as I think bug 69669 may show (panic for some minutes while a translation-admin-related action was being completed). It would be nice to have a write-up of what should be moved to the job queue, but also of what actions should be high priority in the job queue.

Comment 4 Aaron Schulz 2014-08-29 18:52:12 UTC

(In reply to Nemo from comment #3)
> A bit off topic, sorry...
> 
> (In reply to Aaron Schulz from comment #1)
> > I've mentioned that all of this needs to
> > be move to the job queue.
> 
> Using the job queue however can be disruptive sometimes when it gets in the
> way of editing, as I think bug 69669 may show (panic for some minutes while
> a translation-admin-related action was being completed). It would be nice to
> have a write-up of what should be moved to the job queue, but also of what
> actions should be high priority in the job queue.

If the problem is responsiveness, then it can always go into a small dedicated job loop in jobrunner.conf.erb.

Comment 5 Gerrit Notification Bot 2014-08-29 19:34:59 UTC

Change 157040 merged by jenkins-bot:
Avoid GET_LOCK in non-autocommit mode

https://gerrit.wikimedia.org/r/157040

Comment 6 Andre Klapper 2014-11-12 14:53:31 UTC

All patches mentioned in this report were merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links