Trying to add 2 nodes simultaneously to closure_tree table results in a deadlock #240

amitsaxena · 2016-11-21T13:23:50Z

When two nodes are added simultaneously to the closure tree table, this results in a deadlock and the query to acquire lock runs infinitely:

 (0.7ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t46e4b4acfd42770bad34d3e2a13a458a
  (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t9eaf21f51a8c5a1f2e39f6e8c4971b27
  (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS tae827dc66e71e92f4c9f6edecae1cb54
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t7f7d47e378ffde7f311e56fde6613ab5
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS ta7af1b261b90cba9e4d7399c96f17e26
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t3ed658e2e0d3008375e7356cc8e1c4a2
  (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t080acfd6e5536c50c4917ce271671be6
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t850056c6b58f75d8171bdcb69def558a
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t1ece95b52215d7c5e420ea9c0057aa72

You can reproduce the problem easily by using the code below (where List is the closure_tree model):


Thread.new do
  parent = List.find(86)
  new_node = List.new(:name => "test deadlock 1")
  parent.add_child(new_node)
end
Thread.new do
  parent = List.find(82)
  new_node = List.new(:name => "test deadlock 1")
  parent.add_child(new_node)
end

We faced this problem on production, and the infinite locking loop results in mysql being unresponsive, which in turn results in unresponsive app.

closure_tree version: 6.2.0
ruby: 2.3.0
rails: 5.0.0.1

The text was updated successfully, but these errors were encountered:

tomaadland · 2016-11-22T10:13:54Z

I have just discovered a similar problem in production on Postgresql. Have not been able to track down what is causing it, but I discovered an infinite loop of

SELECT pg_try_advisory_lock(1597746110,0) AS t57b1486ecc3691a98e06a13406a11286

in my log.

Database is unresponsive until I restart the app.

closure_tree version: 6.0.0
ruby 2.2.0
rails 4.2.5

mceachen · 2016-11-22T19:31:36Z

Yikes. Thanks for reporting. Can you see if reverting #234 fixes the issue? That's the most recent update to locks.

amitsaxena · 2016-11-23T11:19:14Z

Reverting that commit doesn't seem to fix this.

tomaadland · 2016-12-05T14:36:17Z

Same here.

I had to revert back to version 6.0 in production, but the problem still appears though much less frequent with version 6.0 then 6.2.

I'm trying to find time to write tests to try and reproduce this problem locally.

DuaneCompton · 2016-12-06T14:40:49Z

Can confirm, this issue has been around for a while. 6.2 seems to have increased the frequency from a monthly to hourly occurrence.
Ruby 2.3.1
Rails 4.2.7.1
Closure tree 6.2
PostgreSQL.

mceachen · 2016-12-06T16:01:51Z

Are all you using postgres? What version?

mceachen · 2016-12-06T16:12:25Z

FWIW, I've been looking through the diffs between 6.0.0 and 6.0.2. Nothing immediately nefarious jumps out. v6.0.0...master

tomaadland · 2016-12-06T17:52:48Z

We're using: PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.9.1-16ubuntu6) 4.9.1, 64-bit

DuaneCompton · 2016-12-06T18:05:27Z

I am killing the lock by hand.
SELECT * FROM pg_locks;
SELECT pg_terminate_backend('')
the lock does not respond to a pg_kill_backend when in this state.
I am downgraded to 6.1. In ~10,000 inserts I had 1 lockup. In 6.2 it was occurring almost immediately. We are also on PostgreSQL 9.4.5 AWS hosted.

mceachen · 2016-12-06T20:24:07Z

Thanks for the additional info.

@DuaneCompton did you have 1 lockup/10k inserts with v6.1?

DuaneCompton · 2016-12-06T22:26:04Z

@mceachen There are 6 threads running that are simultaneously under load. under v6.1 I am averaging 1 lockup/10k inserts. In v6.2 it locks basically immediately.
Might be two separate issues at play. I know under v6.0.0 we experienced intermittent lock-ups as well.

amitsaxena · 2016-12-13T21:00:55Z

We use MySql, therefore I don't think this is a postgres specific issue.

DuaneCompton · 2016-12-20T20:25:19Z

@mceachen Anything I can do from a bug gathering standpoint to help troubleshoot this issue? We still run into a lock up on a weekly basis on 6.0. I am probably going to write a little script to free the lock if the table locks for longer than N minutes. But, before I do so is there anyway I can help?

mceachen · 2016-12-20T21:17:13Z

Can you capture the current running queries? Without a test to replicate the issue on demand, any change we make will be a random shot in the dark.

DuaneCompton · 2016-12-21T16:45:12Z

@mceachen will do.

tomaadland · 2017-01-03T11:41:51Z

Has anyone had any more luck in tracking down this issue ?

I've tried to provoke this issue by creating multiple records in parallell in separate threads as was mentioned initially by @amitsaxena. But no luck so far. I've forked off 6 threads each creating a 1000 records. Tried on both 6.2 and 6.0.

amitsaxena · 2017-01-12T10:27:56Z

@tomaadland are you unable to reproduce the issue? I was consistently getting the bug as soon as I execute the code above.

tomaadland · 2017-01-17T11:49:01Z

@amitsaxena as soon as I have some time on my hands i'll create a public project with an example model similar to one I have in production with the tests. I'll post it here to let you know.

DuaneCompton · 2017-02-16T21:53:08Z

@mceachen So, I've noticed when we hit this state there is really just one lock that is hanging around. Is there any reason you could see that we couldn't use the timeout in your with_advisory_lock gem to at least an infinite lock?
I was also curious about this line: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L82
Any reason to not expire the previous lock before taking a new one?

mceachen · 2017-02-16T23:07:09Z

I was also curious about this line: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L82

`with_advisory_lock` is smart enough to no-op if a currently-acquired lock is requested again: See https://github.com/mceachen/with_advisory_lock/blob/master/lib/with_advisory_lock/base.rb#L42 If we auto-expire the rebuild lock, the rebuild isn't guaranteed to be consistent. The deadlock is either due to a bug in the RDBMS, or, much more likely, a circular dependency due to transaction boundaries. For example, if one of your data mutation code paths gets lock A then lock B, and another gets B then A, you'll regularly see deadlocks. (edited to add link to nested lock no-op)

DuaneCompton · 2017-02-17T17:38:00Z

So, its not a dead lock per se. If I move around the locking obtain/release in just the critical sections of the write/delete/children call then it will be detected as a deadlock. What is actually happening is:
writer 1 obtains a lock in the context of a rebuild!.
writer 2 calls add_child (which I believe is going to trigger a rebuild). writer 2 spins for a while attempting to obtain the lock in which case writer 1 is paused.
writer 2 fails to obtain the lock and writer 1 continues writing.
repeat with writer 3...N and writer 1 basically never makes it through a rebuild.
What I am not knowledgable enough to know is why writer 1 does not continue writing when it has already obtained the lock.

mceachen · 2017-02-17T17:49:09Z

Can you maybe point to the code, or talk more about what's going on in writer 1 and 2?

Deadlocks I've seen in the past occur when there are multiple code paths that have transaction blocks that edit models in different order.

DuaneCompton · 2017-02-17T20:02:11Z

Sorry, its not a deadlock in the traditional sense, its actually worse since a deadlock will be detected and aborted. In this case, the first process has obtained the lock and is proceeding on its way when the second process comes in and attempts to obtain a lock. It fails since writer one already has the a lock (which is the correct behavior IMO). What I am having trouble understanding is why writer 1 hangs until writer 2 gives up attempting to obtain its lock.

If I run something like:

  parent = Tag.create(name: 'lock 1')
  (1..100).each do |i|
        p "adding #{i}"
        new_node = Tag.create(:name => "test deadlock #{i}")
        ActiveRecord::Base.transaction do
             parent = parent.add_child(new_node)
        end
end

on two separate machines upon starting the second both will shortly arrive in a semi-deadlocked situation. The script that I started the call on will then eventually expire and the first writer will continue. I've tried moving around the transaction/without it. Below is one potential outcome.

Writer 1 :

D, [2017-02-16T07:31:34.537506 #21545] DEBUG -- :   Tag Exists (0.4ms)  SELECT  1 AS one FROM "tags" INNER JOIN "tag_hierarchies" ON "tags"."id" = "tag_hierarchies"."ancestor_id" WHERE "tag_hierarchies"."descendant_id" = $12 AND "tags"."id" = $2 LIMIT 1  [["descendant_id", 7903], ["id", 7905]]
W, [2017-02-16T07:31:34.537919 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:27:in '_ct_validate'

D, [2017-02-16T07:31:34.541672 #21545] DEBUG -- :   SQL (1.6ms)  UPDATE "tags" SET "parent_id" = $1, "updated_at" = $2 WHERE "tags"."id" = $3  [["parent_id", 7903], ["updated_at", "2017-02-16 07:31:34.538876"], ["id", 7905]]
W, [2017-02-16T07:31:34.542309 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/model.rb:138:in 'add_child'

D, [2017-02-16T07:31:34.545266 #21545] DEBUG -- :    (0.5ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t2ea0208ee2526226ecc8d65da8f4c493
W, [2017-02-16T07:31:34.546018 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'
D, [2017-02-16T07:31:34.646474 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS ta2ef67039c80d539454f14205900baac
W, [2017-02-16T07:31:34.647598 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 
'execute_successful?'

D, [2017-02-16T07:31:34.785180 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS tafb0ba91177bd447d53f97cae6f97ae0
W, [2017-02-16T07:31:34.786015 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:34.910538 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t878276a11847c5bb14ee22e966d51222
W, [2017-02-16T07:31:34.911517 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.014669 #21545] DEBUG -- :    (1.2ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t3996341ab230c95a3222cf08044a7593
W, [2017-02-16T07:31:35.015517 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.098349 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS td67fe931e1e97081594bf1d3b7b577ac
W, [2017-02-16T07:31:35.098631 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.210146 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS td2b2a4adf0f2fb2e61565b1a8d58da8b
W, [2017-02-16T07:31:35.210534 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.351044 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t6ba2e7b92cdaf2d1905f1a6214e7efa2
W, [2017-02-16T07:31:35.351374 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 
'execute_successful?'

D, [2017-02-16T07:31:35.429454 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t8388cb7c01e7341bd1f63e3404fe8e9b
W, [2017-02-16T07:31:35.429904 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.520427 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS tb6c70aa0fe9699434e66857ba96e1c7d
W, [2017-02-16T07:31:35.520766 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.616061 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS taef48c46023f84d4e4c6454fce8dfed7
W, [2017-02-16T07:31:35.616458 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.756555 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t14f7ff2ec5dae5c9961188d75d2f0ea7
W, [2017-02-16T07:31:35.756914 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.829528 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t363a59f8e61d4ce5c0fcfc5abf6e8404
W, [2017-02-16T07:31:35.829871 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.978112 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t57ace053678ffafac925ebec1a163abe
W, [2017-02-16T07:31:35.978513 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:36.053677 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t809dd7878f94941becb585a2ee937e3e
W, [2017-02-16T07:31:36.053976 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

Writer 2

D, [2017-02-16T07:31:34.457059 #21548] DEBUG -- : DEBUG: Chewy strategies stack: [2] <- bypass @ deadlock.rb:1
D, [2017-02-16T07:31:34.465650 #21548] DEBUG -- :    (0.5ms)  BEGIN
W, [2017-02-16T07:31:34.465916 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:31:34.488118 #21548] DEBUG -- :   SQL (0.8ms)  INSERT INTO "tags" ("name", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["name", "lock 1"], ["created_at", "2017-02-16 07:31:34.485771"], ["updated_at", "2017-02-16 07:31:34.485771"]]
W, [2017-02-16T07:31:34.488464 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:31:34.539682 #21548] DEBUG -- :    (0.5ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t9f2a3d3d5ce591e0cbcad87a06e46c26
W, [2017-02-16T07:31:34.539915 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'

D, [2017-02-16T07:31:34.601380 #21548] DEBUG -- :   SQL (0.7ms)  INSERT INTO "tag_hierarchies" ("ancestor_id", "descendant_id", "generations") VALUES ($1, $2, $3)  [["ancestor_id", 7904], ["descendant_id", 7904], ["generations", 0]]
W, [2017-02-16T07:31:34.601783 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:66:in `block in rebuild!'

D, [2017-02-16T07:32:34.608471 #21548] DEBUG -- :    (60004.6ms)  UPDATE "tags"
SET "position" = t.seq + -1
FROM (
  SELECT "id" AS id, row_number() OVER(ORDER BY position) AS seq
  FROM "tags"
  WHERE "parent_id" IS NULL 
) AS t
WHERE "tags"."id" = t.id

W, [2017-02-16T07:32:34.609206 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/numeric_order_support.rb:39:in `reorder_with_parent_id'

D, [2017-02-16T07:32:34.613072 #21548] DEBUG -- :    (1.4ms)  SELECT pg_advisory_unlock(1814493058,0) AS t5a0d17d7a9f5b664b7d1f87c2252d30c
W, [2017-02-16T07:32:34.613436 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'

D, [2017-02-16T07:32:34.614966 #21548] DEBUG -- :    (0.4ms)  ROLLBACK
W, [2017-02-16T07:32:34.615220 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:32:34.616004 #21548] DEBUG -- : DEBUG: Chewy strategies stack: [2] -> bypass @ deadlock.rb:1
/usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `async_exec': PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block (ActiveRecord::StatementInvalid)
: SELECT pg_advisory_unlock(1814493058,0) AS t5a0d17d7a9f5b664b7d1f87c2252d30c
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `block in exec_no_cache'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract_adapter.rb:484:in `block in log'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract_adapter.rb:478:in `log'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `exec_no_cache'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:584:in `execute_and_clear'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:50:in `select_value'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:9:in `release_lock'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:81:in `ensure in yield_with_lock'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:81:in `yield_with_lock'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:65:in `yield_with_lock_and_timeout'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:48:in `with_advisory_lock_if_needed'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/concern.rb:16:in `with_advisory_lock_result'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/concern.rb:10:in `with_advisory_lock'
	from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/support.rb:103:in `with_advisory_lock'
	from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:63:in `rebuild!'
	from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:39:in `_ct_after_save'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:432:in `block in make_lambda'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:228:in `block in halting_and_conditional'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `block in call'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `each'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `call'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:92:in `__run_callbacks__'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:778:in `_run_save_callbacks'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/callbacks.rb:302:in `create_or_update'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/persistence.rb:120:in `save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/validations.rb:37:in `save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/attribute_methods/dirty.rb:21:in `save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:286:in `block (2 levels) in save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:351:in `block in with_transaction_returning_status'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `block in transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/transaction.rb:184:in `within_new_transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:220:in 'transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:348:in 'with_transaction_returning_status'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:286:in 'block in save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:301:in 'rollback_active_record_state!'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:285:in 'save'
	from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent/instrumentation/active_record_helper.rb:26:in 'block in save'
	from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent.rb:586:in 'with_database_metric_name'
	from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent/instrumentation/active_record_helper.rb:25:in 'save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/persistence.rb:34:in 'create'
	from deadlock.rb:3:in `block in <top (required)>'
	from /usr/local/bundle/gems/chewy-0.8.4/lib/chewy/strategy.rb:60:in 'wrap'
	from /usr/local/bundle/gems/chewy-0.8.4/lib/chewy.rb:179:in 'strategy'
	from deadlock.rb:1:in '<top (required)>'
	from /usr/local/bundle/gems/zeus-0.15.3/lib/zeus/load_tracking.rb:50:in 'load'
	from /usr/local/bundle/gems/zeus-0.15.3/lib/zeus/load_tracking.rb:50:in 'load'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/runner.rb:60:in '<top (required)>'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:123:in 'require'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:123:in 'require_command!'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:90:in 'runner'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:39:in 'run_command!'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands.rb:17:in `<top (required)>'
	from bin/rails:9:in `require'
	from bin/rails:9:in `<main>'

mceachen · 2017-02-17T20:29:27Z

And by 👍 I mean 😭. I'll see if I can wire up a test to do this.

DuaneCompton · 2017-02-17T21:00:27Z

Thank you. I will see what I can do as well. I do not believe at this time that its an a/b b/a situation, because I think the lock acquisition time is much higher than the deadlock check timer on the db. I'm wondering if maybe calling a complete table lock might be better in the rebuild! function rather than calling an additional lock for each delete/child insert (maybe I am misunderstanding how those locks work though).

mceachen · 2017-02-17T22:12:08Z

I think a table lock (which I had dismissed as being too coarse a mutex) is an interesting alternative. I'll code up that as an option.

dsounded · 2017-03-04T09:10:19Z

Have almost the same problem(after upgrading to Rails 5), with Rails 4.2.5 it worked well

I have accepts_nested_attributes_for my hierarchy model,
and in that model I also have accepts_nested_attributes_for for its children

And after saving the parent item I got deadlock

Tried versions 6.2 and 6.4 and 6.0

dsounded · 2017-03-04T10:14:47Z

I thinks it's more about Rails update than closure_tree update

mysql Ver 14.14 Distrib 5.5.44
gem mysql2 version 0.3.19

I am playing with it, with_advisory_lock: false maybe solves the problem, but not sure it this is not temporary

DuaneCompton · 2017-03-07T21:16:53Z

@mceachen It appears the issue I am running into is specifically (and I am guessing amitsaxena as well) around the numeric reordering (https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L76)
when calling add_child. Writer 1 will hang on grabbing
https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L63

while writer 2 will attempt to write into: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/numeric_order_support.rb#L39

If you call directly children.create() it will avoid this issue, though I would guess any concurrent movement of leafs with numeric ordering enabled could result in this.

If I replace the with_advisory_lock from support.rb with just a model_class.lock(true) the deadlock appears to go away.

I did also note that if you attempt a destroy_all on a large tag set > 10000 (with our without numeric ordering)
the destroy will fail with fatal: exception reentered.

Hope this is helpful.

mceachen · 2017-03-07T21:40:23Z

@DuaneCompton that is super helpful, and shouldn't be hard to write a breaking test. Do you want to have a go?

DuaneCompton · 2017-03-08T17:00:33Z

I will give it a shot. Thanks.

swrobel · 2017-05-26T02:00:20Z

@DuaneCompton any progress here? I just ran into this issue.

mceachen · 2017-06-13T05:50:29Z

Thanks to @serenaf, she highlighted the MySQL docs that reveal v5.7.5 fundamentally changed how get_lock() works. Fixing that issue should resolve this issue for MySQL users, at least.

mceachen · 2017-07-15T16:35:33Z

@uhlenbrock this issue (which seems to be with_advisory_lock not playing nicely with newer versions of postgres, mysql, and rails) is currently the most pressing for closure_tree. More eyes on it are welcome.

uhlenbrock · 2017-07-19T12:41:38Z

@mceachen I have started to take a look. I have two PRs open on with_advisory_lock to get the testing infrastructure ready to replicate this issue.

tomaadland · 2017-07-25T06:34:24Z

@uhlenbrock Any ideas on how to replicate this in a test ?

uhlenbrock · 2017-07-25T12:00:51Z

@tomaadland I have started work over here: ClosureTree/with_advisory_lock#17

uhlenbrock · 2017-08-03T18:01:06Z

I am working to build a failing test based on the suggestion above:

describe Tag do
  context 'in a race condition' do
    it 'should not break the database' do
      parent1 = Tag.create!(name: 'parent tag 1')
      parent2 = Tag.create!(name: 'parent tag 2')
        
      Thread.new do
        new_node = Tag.new(name: 'test deadlock 1')
        parent1.add_child(new_node)
      end
      Thread.new do
        new_node = Tag.new(name: 'test deadlock 1')
        parent2.add_child(new_node)
      end
    end
  end
end

Via DB=postgresql appraisal activerecord-5.1 rake spec, the following error is raised:

SystemStackError:
       stack level too deep

I believe this is because of Thread.abort_on_exception = true. The test does not raise an error with this line commented out, but also takes between 10 and 20 seconds. Definitely enough to upset your production database. I also found that it is true that calling create instead of new avoids any issue.

Does this appear to adequately reproduce the issue?

tomaadland · 2017-08-10T11:04:53Z

I tried running this test in the gem and I get the SystemStackError

When I created a similar unit test in the app that is using this gem and when calling create! instead of new for creating new children instances to be added I get postgresql to err on deadlock.

ERROR: deadlock detected at character 8

So in my case the

new_node = Tag.create!(name: "test deadlock 1")

causes a deadlock.

tomalok · 2018-03-09T23:32:23Z

I may have found a workaround that works for me -- though at present I'm not sure of what other consequences may arise from my solution.

I'd long since upgraded the database to MariaDB 10.x from some older MySQL without apparent incident. Until one day when I needed to reparent something, and then...

Mysql2::Error: Lock wait timeout exceeded; try restarting transaction

At the time, i wrote an awful script that monkey-patched the part of closure_tree that dealt with locking (IIRC), and managed to get my reparenting done.

Now, however, I'm going to need to reparent more often, so I took some time today to do a little more research with a simple test script...

#!/usr/bin/ruby

$LOAD_PATH.unshift '/idw/lib'
Dir.chdir('/idw') do
  require 'rubygems'
  require 'bundler'
  Bundler.require(:default)   # TODO: split this out to just what's needed
  require '/idw/config/environment'
end

require 'pp'

# connect to the right database for the curent environment
IDW::Base.establish_connection(YAML.load(File.read('/idw/config/database.yml'))[ENV['IDW_ENV']])
IDW::Base.connection_pool.with_connection do

  hc = IDW::Hostclass.find_by(name: 'NS_OPMS')
  pp hc.self_and_ancestor_ids

  phc = IDW::Hostclass.find_by(name: 'NS_ORCAWAVE')
  pp phc.self_and_ancestor_ids

  phc.children << hc
  pp hc.self_and_ancestor_ids

end

Which, when run... Failed:

$ ./reparent
D, [2018-03-09T21:57:01.685550 #148] DEBUG -- :    (0.3ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
D, [2018-03-09T21:57:01.739543 #148] DEBUG -- :    (0.2ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
D, [2018-03-09T21:57:01.750891 #148] DEBUG -- :   IDW::Hostclass Load (0.4ms)  SELECT  `hostclasses`.* FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_OPMS' LIMIT 1
D, [2018-03-09T21:57:01.773695 #148] DEBUG -- :    (0.7ms)  SELECT `hostclasses`.`id` FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 214 ORDER BY `hostclass_hierarchies`.generations asc
[214, 199, 2]
D, [2018-03-09T21:57:01.774403 #148] DEBUG -- :   IDW::Hostclass Load (0.4ms)  SELECT  `hostclasses`.* FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_ORCAWAVE' LIMIT 1
D, [2018-03-09T21:57:01.776208 #148] DEBUG -- :    (0.6ms)  SELECT `hostclasses`.`id` FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 215 ORDER BY `hostclass_hierarchies`.generations asc
[215, 199, 2]
D, [2018-03-09T21:57:01.776909 #148] DEBUG -- :    (0.3ms)  BEGIN
D, [2018-03-09T21:57:01.781243 #148] DEBUG -- :   IDW::Hostclass Exists (0.5ms)  SELECT  1 AS one FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 215 AND `hostclasses`.`id` = 214 LIMIT 1
D, [2018-03-09T21:57:01.782238 #148] DEBUG -- :   IDW::Hostclass Exists (0.3ms)  SELECT  1 AS one FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_OPMS' AND (`hostclasses`.`id` != 214) LIMIT 1
D, [2018-03-09T21:57:01.783606 #148] DEBUG -- :   SQL (0.4ms)  UPDATE `hostclasses` SET `parent_id` = 215, `updated_at` = '2018-03-09 21:57:01' WHERE `hostclasses`.`id` = 214
D, [2018-03-09T21:57:01.786316 #148] DEBUG -- :    (0.3ms)  SELECT GET_LOCK('3bd772cb25a662070242f7efd5ca79ef2', 0) AS t5e93ac4ad69f0ed6f2c5c9584d8b188b
D, [2018-03-09T21:57:01.788037 #148] DEBUG -- :    (1.4ms)  DELETE FROM `hostclass_hierarchies`
WHERE descendant_id IN (
  SELECT DISTINCT descendant_id
  FROM (SELECT descendant_id
    FROM `hostclass_hierarchies`
    WHERE ancestor_id = 214
       OR descendant_id = 214
  ) AS x )

D, [2018-03-09T21:57:01.793990 #148] DEBUG -- :    (0.2ms)  BEGIN
D, [2018-03-09T21:57:52.175943 #148] DEBUG -- :   SQL (50381.2ms)  INSERT INTO `hostclass_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (214, 214, 0)
D, [2018-03-09T21:57:52.176687 #148] DEBUG -- :    (0.5ms)  ROLLBACK
D, [2018-03-09T21:57:52.177324 #148] DEBUG -- :    (0.4ms)  SELECT RELEASE_LOCK('3bd772cb25a662070242f7efd5ca79ef2') AS t72718680058c1a95838500647177a561
D, [2018-03-09T21:57:52.177853 #148] DEBUG -- :    (0.4ms)  ROLLBACK
/idw/vendor/bundle/ruby/2.4.0/gems/mysql2-0.4.10/lib/mysql2/client.rb:120:in `_query': Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: INSERT INTO `hostclass_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (214, 214, 0) (ActiveRecord::StatementInvalid)

...but provided some clues as to what was going on.

BEGIN transaction 1
- UPDATE the node with its new parent
- SELECT GET_LOCK stuff...
- DELETE the old hierarchy information
- BEGIN transaction 2
  - try to INSERT the new hierarchy information, but times out
- ROLLBACK
- SELECT RELEASE_LOCK
ROLLBACK

The (currently) third solution to https://stackoverflow.com/questions/6000336/how-to-debug-lock-wait-timeout-exceeded-on-mysql mentions that InnoDB's default isolation level is REPEATABLE_READ, whereas other DBs use READ_COMMITTED.

Setting my database's default isolation level to READ_COMMITTED...

SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

...did the trick and allowed the reparenting script (and another more complicated script) to work as anticipated. Not sure I'd want to leave this on globally, might be best to set it as tightly around the code as possible (before that first transaction). Maybe something like...

FooModel.transaction(isolation: :read_committed) do
  # the reparenting stuff
end

I'm not confident this solves the other two-thread deadlock issue, as that appears to be happening with non-MySQL/MariaDB databases, also.

mceachen · 2018-03-10T16:46:48Z

I hadn't thought about this before @tomalok 's post, but I believe with_advisory_lock could no-op if it could reliably determine that the current database is configured as READ COMMITTED or supports MVCC. It'd be good to get a simple breaking test in place to verify this hypothesis.

kawadhiya21 · 2023-02-24T10:40:20Z

we are facing the same issue. Any update on this?

jkraemer · 2023-11-23T01:48:25Z

I can reliably reproduce this deadlock with the following test case, which concurrently removes children from a nested set with numeric sort order. The ordering seems to be the culprit, because it tries to update siblings' sort indices which leads to an implicit row lock, which, in conjunction with the implicit row locks taken out when rails updates the parent_id on records (which happens before the advisory_lock is attempted). So a possible scenario would be:

A updates child1.parent_id (and now holds an implicit lock on child1)
B updates child2.parent_id (implicit lock on child2)
A wins the race and gets the advisory lock
B now waits for the advisory lock
A tries to update siblings' sort column, among them child2.
A waits to get the lock on child2 for the update.

This kind of deadlock where one writer waits for the advisory lock while the other waits for a row level lock apparently isn't detected by either MySQL or Postgres, leading to an endless wait.

The only way I found to fix this is to get the advisory lock before even updating the parent_id, i.e. by calling record._ct.with_advisory_lock { record.save } in your application code.

test case:

  it "fails to deadlock from removing siblings with sort_order" do
    skip("unsupported") unless run_parallel_tests?

    @root = Label.create!(name: 'root')
    60.times{ |i| Label.create! name: i, mother_id: @root.id }
    @root.reload

    # remove 40 random children in parallel threads
    descendant_ids = @root.descendant_ids
    ids_to_remove = descendant_ids.sample(40).shuffle
    ids_to_keep = descendant_ids - ids_to_remove
    Parallel.map(ids_to_remove, in_threads: max_threads) do |id|
      ActiveRecord::Base.connection_pool.with_connection do
        Label.find(id).update(mother_id: nil)
      end
    end

    Label.where(id: ids_to_remove).each do |l|
      assert_nil l.mother_id
      assert l.root?
      assert l.leaf?
    end
    @root.reload
    assert_equal 20, @root.children.count
    assert order = @root.children.pluck(:column_whereby_ordering_is_inferred).uniq
    assert_equal (0..19).to_a, order
  end

The test will get stuck in an endless loop, unless the call to ct_reorder_prior_siblings_if_parent_changed in the rebuild! method is commented out. It might worth thinking about the necessity of that reordering - maybe having non-consecutive sort indices isn't even an issue?

mceachen added the Bug label Nov 22, 2016

mceachen added the Community help requested label Apr 14, 2017

mceachen mentioned this issue Jul 24, 2017

Your Opinion Please: What would you like to happen to ClosureTree? #277

Closed

patrickdavey mentioned this issue Sep 6, 2017

Issue when reordering within scopes brendon/acts_as_list#131

Closed

Trying to add 2 nodes simultaneously to closure_tree table results in a deadlock #240

Trying to add 2 nodes simultaneously to closure_tree table results in a deadlock #240

Comments

amitsaxena commented Nov 21, 2016 • edited Loading

tomaadland commented Nov 22, 2016

mceachen commented Nov 22, 2016

amitsaxena commented Nov 23, 2016

tomaadland commented Dec 5, 2016

DuaneCompton commented Dec 6, 2016

mceachen commented Dec 6, 2016 • edited Loading

mceachen commented Dec 6, 2016

tomaadland commented Dec 6, 2016

DuaneCompton commented Dec 6, 2016

mceachen commented Dec 6, 2016

DuaneCompton commented Dec 6, 2016

amitsaxena commented Dec 13, 2016

DuaneCompton commented Dec 20, 2016

mceachen commented Dec 20, 2016 via email

DuaneCompton commented Dec 21, 2016

tomaadland commented Jan 3, 2017

amitsaxena commented Jan 12, 2017

tomaadland commented Jan 17, 2017

DuaneCompton commented Feb 16, 2017

mceachen commented Feb 16, 2017 via email • edited Loading

DuaneCompton commented Feb 17, 2017

mceachen commented Feb 17, 2017

DuaneCompton commented Feb 17, 2017 • edited by mceachen Loading

mceachen commented Feb 17, 2017

DuaneCompton commented Feb 17, 2017

mceachen commented Feb 17, 2017

dsounded commented Mar 4, 2017 • edited Loading

dsounded commented Mar 4, 2017 • edited Loading

DuaneCompton commented Mar 7, 2017

mceachen commented Mar 7, 2017

DuaneCompton commented Mar 8, 2017

swrobel commented May 26, 2017

mceachen commented Jun 13, 2017

mceachen commented Jul 15, 2017

uhlenbrock commented Jul 19, 2017

tomaadland commented Jul 25, 2017

uhlenbrock commented Jul 25, 2017

uhlenbrock commented Aug 3, 2017

tomaadland commented Aug 10, 2017

tomalok commented Mar 9, 2018 • edited Loading

mceachen commented Mar 10, 2018

kawadhiya21 commented Feb 24, 2023

jkraemer commented Nov 23, 2023

amitsaxena commented Nov 21, 2016 •

edited

Loading

mceachen commented Dec 6, 2016 •

edited

Loading

mceachen commented Feb 16, 2017 via email •

edited

Loading

DuaneCompton commented Feb 17, 2017 •

edited by mceachen

Loading

dsounded commented Mar 4, 2017 •

edited

Loading

dsounded commented Mar 4, 2017 •

edited

Loading

tomalok commented Mar 9, 2018 •

edited

Loading