Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to add 2 nodes simultaneously to closure_tree table results in a deadlock #240

Open
amitsaxena opened this issue Nov 21, 2016 · 43 comments

Comments

@amitsaxena
Copy link

amitsaxena commented Nov 21, 2016

When two nodes are added simultaneously to the closure tree table, this results in a deadlock and the query to acquire lock runs infinitely:

 (0.7ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t46e4b4acfd42770bad34d3e2a13a458a
  (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t9eaf21f51a8c5a1f2e39f6e8c4971b27
  (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS tae827dc66e71e92f4c9f6edecae1cb54
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t7f7d47e378ffde7f311e56fde6613ab5
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS ta7af1b261b90cba9e4d7399c96f17e26
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t3ed658e2e0d3008375e7356cc8e1c4a2
  (0.4ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t080acfd6e5536c50c4917ce271671be6
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t850056c6b58f75d8171bdcb69def558a
  (0.3ms)  SELECT GET_LOCK('b218f48596dc37d16bbcd16d1e18a116c', 0) AS t1ece95b52215d7c5e420ea9c0057aa72

You can reproduce the problem easily by using the code below (where List is the closure_tree model):


Thread.new do
  parent = List.find(86)
  new_node = List.new(:name => "test deadlock 1")
  parent.add_child(new_node)
end
Thread.new do
  parent = List.find(82)
  new_node = List.new(:name => "test deadlock 1")
  parent.add_child(new_node)
end

We faced this problem on production, and the infinite locking loop results in mysql being unresponsive, which in turn results in unresponsive app.

closure_tree version: 6.2.0
ruby: 2.3.0
rails: 5.0.0.1

@tomaadland
Copy link

I have just discovered a similar problem in production on Postgresql. Have not been able to track down what is causing it, but I discovered an infinite loop of

SELECT pg_try_advisory_lock(1597746110,0) AS t57b1486ecc3691a98e06a13406a11286

in my log.

Database is unresponsive until I restart the app.

closure_tree version: 6.0.0
ruby 2.2.0
rails 4.2.5

@mceachen
Copy link
Collaborator

Yikes. Thanks for reporting. Can you see if reverting #234 fixes the issue? That's the most recent update to locks.

@mceachen mceachen added the Bug label Nov 22, 2016
@amitsaxena
Copy link
Author

Reverting that commit doesn't seem to fix this.

@tomaadland
Copy link

Same here.

I had to revert back to version 6.0 in production, but the problem still appears though much less frequent with version 6.0 then 6.2.

I'm trying to find time to write tests to try and reproduce this problem locally.

@DuaneCompton
Copy link

Can confirm, this issue has been around for a while. 6.2 seems to have increased the frequency from a monthly to hourly occurrence.
Ruby 2.3.1
Rails 4.2.7.1
Closure tree 6.2
PostgreSQL.

@mceachen
Copy link
Collaborator

mceachen commented Dec 6, 2016

Are all you using postgres? What version?

@mceachen
Copy link
Collaborator

mceachen commented Dec 6, 2016

FWIW, I've been looking through the diffs between 6.0.0 and 6.0.2. Nothing immediately nefarious jumps out. v6.0.0...master

@tomaadland
Copy link

We're using: PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.9.1-16ubuntu6) 4.9.1, 64-bit

@DuaneCompton
Copy link

I am killing the lock by hand.
SELECT * FROM pg_locks;
SELECT pg_terminate_backend('')
the lock does not respond to a pg_kill_backend when in this state.
I am downgraded to 6.1. In ~10,000 inserts I had 1 lockup. In 6.2 it was occurring almost immediately. We are also on PostgreSQL 9.4.5 AWS hosted.

@mceachen
Copy link
Collaborator

mceachen commented Dec 6, 2016

Thanks for the additional info.

@DuaneCompton did you have 1 lockup/10k inserts with v6.1?

@DuaneCompton
Copy link

@mceachen There are 6 threads running that are simultaneously under load. under v6.1 I am averaging 1 lockup/10k inserts. In v6.2 it locks basically immediately.
Might be two separate issues at play. I know under v6.0.0 we experienced intermittent lock-ups as well.

@amitsaxena
Copy link
Author

We use MySql, therefore I don't think this is a postgres specific issue.

@DuaneCompton
Copy link

@mceachen Anything I can do from a bug gathering standpoint to help troubleshoot this issue? We still run into a lock up on a weekly basis on 6.0. I am probably going to write a little script to free the lock if the table locks for longer than N minutes. But, before I do so is there anyway I can help?

@mceachen
Copy link
Collaborator

mceachen commented Dec 20, 2016 via email

@DuaneCompton
Copy link

@mceachen will do.

@tomaadland
Copy link

Has anyone had any more luck in tracking down this issue ?

I've tried to provoke this issue by creating multiple records in parallell in separate threads as was mentioned initially by @amitsaxena. But no luck so far. I've forked off 6 threads each creating a 1000 records. Tried on both 6.2 and 6.0.

@amitsaxena
Copy link
Author

@tomaadland are you unable to reproduce the issue? I was consistently getting the bug as soon as I execute the code above.

@tomaadland
Copy link

@amitsaxena as soon as I have some time on my hands i'll create a public project with an example model similar to one I have in production with the tests. I'll post it here to let you know.

@DuaneCompton
Copy link

@mceachen So, I've noticed when we hit this state there is really just one lock that is hanging around. Is there any reason you could see that we couldn't use the timeout in your with_advisory_lock gem to at least an infinite lock?
I was also curious about this line: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L82
Any reason to not expire the previous lock before taking a new one?

@mceachen
Copy link
Collaborator

mceachen commented Feb 16, 2017 via email

@DuaneCompton
Copy link

So, its not a dead lock per se. If I move around the locking obtain/release in just the critical sections of the write/delete/children call then it will be detected as a deadlock. What is actually happening is:
writer 1 obtains a lock in the context of a rebuild!.
writer 2 calls add_child (which I believe is going to trigger a rebuild). writer 2 spins for a while attempting to obtain the lock in which case writer 1 is paused.
writer 2 fails to obtain the lock and writer 1 continues writing.
repeat with writer 3...N and writer 1 basically never makes it through a rebuild.
What I am not knowledgable enough to know is why writer 1 does not continue writing when it has already obtained the lock.

@mceachen
Copy link
Collaborator

Can you maybe point to the code, or talk more about what's going on in writer 1 and 2?

Deadlocks I've seen in the past occur when there are multiple code paths that have transaction blocks that edit models in different order.

@DuaneCompton
Copy link

DuaneCompton commented Feb 17, 2017

Sorry, its not a deadlock in the traditional sense, its actually worse since a deadlock will be detected and aborted. In this case, the first process has obtained the lock and is proceeding on its way when the second process comes in and attempts to obtain a lock. It fails since writer one already has the a lock (which is the correct behavior IMO). What I am having trouble understanding is why writer 1 hangs until writer 2 gives up attempting to obtain its lock.

If I run something like:

  parent = Tag.create(name: 'lock 1')
  (1..100).each do |i|
        p "adding #{i}"
        new_node = Tag.create(:name => "test deadlock #{i}")
        ActiveRecord::Base.transaction do
             parent = parent.add_child(new_node)
        end
end

on two separate machines upon starting the second both will shortly arrive in a semi-deadlocked situation. The script that I started the call on will then eventually expire and the first writer will continue. I've tried moving around the transaction/without it. Below is one potential outcome.

Writer 1 :

D, [2017-02-16T07:31:34.537506 #21545] DEBUG -- :   Tag Exists (0.4ms)  SELECT  1 AS one FROM "tags" INNER JOIN "tag_hierarchies" ON "tags"."id" = "tag_hierarchies"."ancestor_id" WHERE "tag_hierarchies"."descendant_id" = $12 AND "tags"."id" = $2 LIMIT 1  [["descendant_id", 7903], ["id", 7905]]
W, [2017-02-16T07:31:34.537919 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:27:in '_ct_validate'

D, [2017-02-16T07:31:34.541672 #21545] DEBUG -- :   SQL (1.6ms)  UPDATE "tags" SET "parent_id" = $1, "updated_at" = $2 WHERE "tags"."id" = $3  [["parent_id", 7903], ["updated_at", "2017-02-16 07:31:34.538876"], ["id", 7905]]
W, [2017-02-16T07:31:34.542309 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/model.rb:138:in 'add_child'

D, [2017-02-16T07:31:34.545266 #21545] DEBUG -- :    (0.5ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t2ea0208ee2526226ecc8d65da8f4c493
W, [2017-02-16T07:31:34.546018 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'
D, [2017-02-16T07:31:34.646474 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS ta2ef67039c80d539454f14205900baac
W, [2017-02-16T07:31:34.647598 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 
'execute_successful?'

D, [2017-02-16T07:31:34.785180 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS tafb0ba91177bd447d53f97cae6f97ae0
W, [2017-02-16T07:31:34.786015 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:34.910538 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t878276a11847c5bb14ee22e966d51222
W, [2017-02-16T07:31:34.911517 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.014669 #21545] DEBUG -- :    (1.2ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t3996341ab230c95a3222cf08044a7593
W, [2017-02-16T07:31:35.015517 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.098349 #21545] DEBUG -- :    (0.8ms)  SELECT pg_try_advisory_lock(1814493058,0) AS td67fe931e1e97081594bf1d3b7b577ac
W, [2017-02-16T07:31:35.098631 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.210146 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS td2b2a4adf0f2fb2e61565b1a8d58da8b
W, [2017-02-16T07:31:35.210534 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.351044 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t6ba2e7b92cdaf2d1905f1a6214e7efa2
W, [2017-02-16T07:31:35.351374 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 
'execute_successful?'

D, [2017-02-16T07:31:35.429454 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t8388cb7c01e7341bd1f63e3404fe8e9b
W, [2017-02-16T07:31:35.429904 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.520427 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS tb6c70aa0fe9699434e66857ba96e1c7d
W, [2017-02-16T07:31:35.520766 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.616061 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS taef48c46023f84d4e4c6454fce8dfed7
W, [2017-02-16T07:31:35.616458 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.756555 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t14f7ff2ec5dae5c9961188d75d2f0ea7
W, [2017-02-16T07:31:35.756914 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.829528 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t363a59f8e61d4ce5c0fcfc5abf6e8404
W, [2017-02-16T07:31:35.829871 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:35.978112 #21545] DEBUG -- :    (0.9ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t57ace053678ffafac925ebec1a163abe
W, [2017-02-16T07:31:35.978513 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

D, [2017-02-16T07:31:36.053677 #21545] DEBUG -- :    (0.7ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t809dd7878f94941becb585a2ee937e3e
W, [2017-02-16T07:31:36.053976 #21545]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in 'execute_successful?'

Writer 2

D, [2017-02-16T07:31:34.457059 #21548] DEBUG -- : DEBUG: Chewy strategies stack: [2] <- bypass @ deadlock.rb:1
D, [2017-02-16T07:31:34.465650 #21548] DEBUG -- :    (0.5ms)  BEGIN
W, [2017-02-16T07:31:34.465916 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:31:34.488118 #21548] DEBUG -- :   SQL (0.8ms)  INSERT INTO "tags" ("name", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["name", "lock 1"], ["created_at", "2017-02-16 07:31:34.485771"], ["updated_at", "2017-02-16 07:31:34.485771"]]
W, [2017-02-16T07:31:34.488464 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:31:34.539682 #21548] DEBUG -- :    (0.5ms)  SELECT pg_try_advisory_lock(1814493058,0) AS t9f2a3d3d5ce591e0cbcad87a06e46c26
W, [2017-02-16T07:31:34.539915 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'

D, [2017-02-16T07:31:34.601380 #21548] DEBUG -- :   SQL (0.7ms)  INSERT INTO "tag_hierarchies" ("ancestor_id", "descendant_id", "generations") VALUES ($1, $2, $3)  [["ancestor_id", 7904], ["descendant_id", 7904], ["generations", 0]]
W, [2017-02-16T07:31:34.601783 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:66:in `block in rebuild!'

D, [2017-02-16T07:32:34.608471 #21548] DEBUG -- :    (60004.6ms)  UPDATE "tags"
SET "position" = t.seq + -1
FROM (
  SELECT "id" AS id, row_number() OVER(ORDER BY position) AS seq
  FROM "tags"
  WHERE "parent_id" IS NULL 
) AS t
WHERE "tags"."id" = t.id

W, [2017-02-16T07:32:34.609206 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/numeric_order_support.rb:39:in `reorder_with_parent_id'

D, [2017-02-16T07:32:34.613072 #21548] DEBUG -- :    (1.4ms)  SELECT pg_advisory_unlock(1814493058,0) AS t5a0d17d7a9f5b664b7d1f87c2252d30c
W, [2017-02-16T07:32:34.613436 #21548]  WARN -- :   ↳ /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'

D, [2017-02-16T07:32:34.614966 #21548] DEBUG -- :    (0.4ms)  ROLLBACK
W, [2017-02-16T07:32:34.615220 #21548]  WARN -- :   ↳ deadlock.rb:3:in `block in <top (required)>'

D, [2017-02-16T07:32:34.616004 #21548] DEBUG -- : DEBUG: Chewy strategies stack: [2] -> bypass @ deadlock.rb:1
/usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `async_exec': PG::InFailedSqlTransaction: ERROR:  current transaction is aborted, commands ignored until end of transaction block (ActiveRecord::StatementInvalid)
: SELECT pg_advisory_unlock(1814493058,0) AS t5a0d17d7a9f5b664b7d1f87c2252d30c
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `block in exec_no_cache'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract_adapter.rb:484:in `block in log'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract_adapter.rb:478:in `log'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `exec_no_cache'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql_adapter.rb:584:in `execute_and_clear'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:50:in `select_value'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:14:in `execute_successful?'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/postgresql.rb:9:in `release_lock'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:81:in `ensure in yield_with_lock'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:81:in `yield_with_lock'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:65:in `yield_with_lock_and_timeout'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/base.rb:48:in `with_advisory_lock_if_needed'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/concern.rb:16:in `with_advisory_lock_result'
	from /usr/local/bundle/gems/with_advisory_lock-3.0.0/lib/with_advisory_lock/concern.rb:10:in `with_advisory_lock'
	from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/support.rb:103:in `with_advisory_lock'
	from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:63:in `rebuild!'
	from /usr/local/bundle/gems/closure_tree-6.0.0/lib/closure_tree/hierarchy_maintenance.rb:39:in `_ct_after_save'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:432:in `block in make_lambda'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:228:in `block in halting_and_conditional'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `block in call'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `each'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:506:in `call'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:92:in `__run_callbacks__'
	from /usr/local/bundle/gems/activesupport-4.2.7.1/lib/active_support/callbacks.rb:778:in `_run_save_callbacks'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/callbacks.rb:302:in `create_or_update'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/persistence.rb:120:in `save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/validations.rb:37:in `save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/attribute_methods/dirty.rb:21:in `save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:286:in `block (2 levels) in save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:351:in `block in with_transaction_returning_status'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `block in transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/transaction.rb:184:in `within_new_transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:220:in 'transaction'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:348:in 'with_transaction_returning_status'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:286:in 'block in save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:301:in 'rollback_active_record_state!'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/transactions.rb:285:in 'save'
	from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent/instrumentation/active_record_helper.rb:26:in 'block in save'
	from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent.rb:586:in 'with_database_metric_name'
	from /usr/local/bundle/gems/newrelic_rpm-3.17.2.327/lib/new_relic/agent/instrumentation/active_record_helper.rb:25:in 'save'
	from /usr/local/bundle/gems/activerecord-4.2.7.1/lib/active_record/persistence.rb:34:in 'create'
	from deadlock.rb:3:in `block in <top (required)>'
	from /usr/local/bundle/gems/chewy-0.8.4/lib/chewy/strategy.rb:60:in 'wrap'
	from /usr/local/bundle/gems/chewy-0.8.4/lib/chewy.rb:179:in 'strategy'
	from deadlock.rb:1:in '<top (required)>'
	from /usr/local/bundle/gems/zeus-0.15.3/lib/zeus/load_tracking.rb:50:in 'load'
	from /usr/local/bundle/gems/zeus-0.15.3/lib/zeus/load_tracking.rb:50:in 'load'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/runner.rb:60:in '<top (required)>'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:123:in 'require'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:123:in 'require_command!'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:90:in 'runner'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:39:in 'run_command!'
	from /usr/local/bundle/gems/railties-4.2.7.1/lib/rails/commands.rb:17:in `<top (required)>'
	from bin/rails:9:in `require'
	from bin/rails:9:in `<main>'

@mceachen
Copy link
Collaborator

And by 👍 I mean 😭. I'll see if I can wire up a test to do this.

@DuaneCompton
Copy link

Thank you. I will see what I can do as well. I do not believe at this time that its an a/b b/a situation, because I think the lock acquisition time is much higher than the deadlock check timer on the db. I'm wondering if maybe calling a complete table lock might be better in the rebuild! function rather than calling an additional lock for each delete/child insert (maybe I am misunderstanding how those locks work though).

@mceachen
Copy link
Collaborator

I think a table lock (which I had dismissed as being too coarse a mutex) is an interesting alternative. I'll code up that as an option.

@dsounded
Copy link

dsounded commented Mar 4, 2017

Have almost the same problem(after upgrading to Rails 5), with Rails 4.2.5 it worked well

I have accepts_nested_attributes_for my hierarchy model,
and in that model I also have accepts_nested_attributes_for for its children

And after saving the parent item I got deadlock

Tried versions 6.2 and 6.4 and 6.0

@dsounded
Copy link

dsounded commented Mar 4, 2017

I thinks it's more about Rails update than closure_tree update

mysql Ver 14.14 Distrib 5.5.44
gem mysql2 version 0.3.19

I am playing with it, with_advisory_lock: false maybe solves the problem, but not sure it this is not temporary

@DuaneCompton
Copy link

@mceachen It appears the issue I am running into is specifically (and I am guessing amitsaxena as well) around the numeric reordering (https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L76)
when calling add_child. Writer 1 will hang on grabbing
https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/hierarchy_maintenance.rb#L63

while writer 2 will attempt to write into: https://github.com/mceachen/closure_tree/blob/master/lib/closure_tree/numeric_order_support.rb#L39

If you call directly children.create() it will avoid this issue, though I would guess any concurrent movement of leafs with numeric ordering enabled could result in this.

If I replace the with_advisory_lock from support.rb with just a model_class.lock(true) the deadlock appears to go away.

I did also note that if you attempt a destroy_all on a large tag set > 10000 (with our without numeric ordering)
the destroy will fail with fatal: exception reentered.

Hope this is helpful.

@mceachen
Copy link
Collaborator

mceachen commented Mar 7, 2017

@DuaneCompton that is super helpful, and shouldn't be hard to write a breaking test. Do you want to have a go?

@DuaneCompton
Copy link

I will give it a shot. Thanks.

@swrobel
Copy link
Contributor

swrobel commented May 26, 2017

@DuaneCompton any progress here? I just ran into this issue.

@mceachen
Copy link
Collaborator

Thanks to @serenaf, she highlighted the MySQL docs that reveal v5.7.5 fundamentally changed how get_lock() works. Fixing that issue should resolve this issue for MySQL users, at least.

@mceachen
Copy link
Collaborator

@uhlenbrock this issue (which seems to be with_advisory_lock not playing nicely with newer versions of postgres, mysql, and rails) is currently the most pressing for closure_tree. More eyes on it are welcome.

@uhlenbrock
Copy link
Collaborator

@mceachen I have started to take a look. I have two PRs open on with_advisory_lock to get the testing infrastructure ready to replicate this issue.

@tomaadland
Copy link

@uhlenbrock Any ideas on how to replicate this in a test ?

@uhlenbrock
Copy link
Collaborator

@tomaadland I have started work over here: ClosureTree/with_advisory_lock#17

@uhlenbrock
Copy link
Collaborator

I am working to build a failing test based on the suggestion above:

describe Tag do
  context 'in a race condition' do
    it 'should not break the database' do
      parent1 = Tag.create!(name: 'parent tag 1')
      parent2 = Tag.create!(name: 'parent tag 2')
        
      Thread.new do
        new_node = Tag.new(name: 'test deadlock 1')
        parent1.add_child(new_node)
      end
      Thread.new do
        new_node = Tag.new(name: 'test deadlock 1')
        parent2.add_child(new_node)
      end
    end
  end
end

Via DB=postgresql appraisal activerecord-5.1 rake spec, the following error is raised:

SystemStackError:
       stack level too deep

I believe this is because of Thread.abort_on_exception = true. The test does not raise an error with this line commented out, but also takes between 10 and 20 seconds. Definitely enough to upset your production database. I also found that it is true that calling create instead of new avoids any issue.

Does this appear to adequately reproduce the issue?

@tomaadland
Copy link

I tried running this test in the gem and I get the SystemStackError

When I created a similar unit test in the app that is using this gem and when calling create! instead of new for creating new children instances to be added I get postgresql to err on deadlock.

ERROR: deadlock detected at character 8

So in my case the

new_node = Tag.create!(name: "test deadlock 1")

causes a deadlock.

@tomalok
Copy link

tomalok commented Mar 9, 2018

I may have found a workaround that works for me -- though at present I'm not sure of what other consequences may arise from my solution.

I'd long since upgraded the database to MariaDB 10.x from some older MySQL without apparent incident. Until one day when I needed to reparent something, and then...

Mysql2::Error: Lock wait timeout exceeded; try restarting transaction

At the time, i wrote an awful script that monkey-patched the part of closure_tree that dealt with locking (IIRC), and managed to get my reparenting done.

Now, however, I'm going to need to reparent more often, so I took some time today to do a little more research with a simple test script...

#!/usr/bin/ruby

$LOAD_PATH.unshift '/idw/lib'
Dir.chdir('/idw') do
  require 'rubygems'
  require 'bundler'
  Bundler.require(:default)   # TODO: split this out to just what's needed
  require '/idw/config/environment'
end

require 'pp'

# connect to the right database for the curent environment
IDW::Base.establish_connection(YAML.load(File.read('/idw/config/database.yml'))[ENV['IDW_ENV']])
IDW::Base.connection_pool.with_connection do

  hc = IDW::Hostclass.find_by(name: 'NS_OPMS')
  pp hc.self_and_ancestor_ids

  phc = IDW::Hostclass.find_by(name: 'NS_ORCAWAVE')
  pp phc.self_and_ancestor_ids

  phc.children << hc
  pp hc.self_and_ancestor_ids

end

Which, when run... Failed:

$ ./reparent
D, [2018-03-09T21:57:01.685550 #148] DEBUG -- :    (0.3ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
D, [2018-03-09T21:57:01.739543 #148] DEBUG -- :    (0.2ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
D, [2018-03-09T21:57:01.750891 #148] DEBUG -- :   IDW::Hostclass Load (0.4ms)  SELECT  `hostclasses`.* FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_OPMS' LIMIT 1
D, [2018-03-09T21:57:01.773695 #148] DEBUG -- :    (0.7ms)  SELECT `hostclasses`.`id` FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 214 ORDER BY `hostclass_hierarchies`.generations asc
[214, 199, 2]
D, [2018-03-09T21:57:01.774403 #148] DEBUG -- :   IDW::Hostclass Load (0.4ms)  SELECT  `hostclasses`.* FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_ORCAWAVE' LIMIT 1
D, [2018-03-09T21:57:01.776208 #148] DEBUG -- :    (0.6ms)  SELECT `hostclasses`.`id` FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 215 ORDER BY `hostclass_hierarchies`.generations asc
[215, 199, 2]
D, [2018-03-09T21:57:01.776909 #148] DEBUG -- :    (0.3ms)  BEGIN
D, [2018-03-09T21:57:01.781243 #148] DEBUG -- :   IDW::Hostclass Exists (0.5ms)  SELECT  1 AS one FROM `hostclasses` INNER JOIN `hostclass_hierarchies` ON `hostclasses`.`id` = `hostclass_hierarchies`.`ancestor_id` WHERE `hostclass_hierarchies`.`descendant_id` = 215 AND `hostclasses`.`id` = 214 LIMIT 1
D, [2018-03-09T21:57:01.782238 #148] DEBUG -- :   IDW::Hostclass Exists (0.3ms)  SELECT  1 AS one FROM `hostclasses` WHERE `hostclasses`.`name` = 'NS_OPMS' AND (`hostclasses`.`id` != 214) LIMIT 1
D, [2018-03-09T21:57:01.783606 #148] DEBUG -- :   SQL (0.4ms)  UPDATE `hostclasses` SET `parent_id` = 215, `updated_at` = '2018-03-09 21:57:01' WHERE `hostclasses`.`id` = 214
D, [2018-03-09T21:57:01.786316 #148] DEBUG -- :    (0.3ms)  SELECT GET_LOCK('3bd772cb25a662070242f7efd5ca79ef2', 0) AS t5e93ac4ad69f0ed6f2c5c9584d8b188b
D, [2018-03-09T21:57:01.788037 #148] DEBUG -- :    (1.4ms)  DELETE FROM `hostclass_hierarchies`
WHERE descendant_id IN (
  SELECT DISTINCT descendant_id
  FROM (SELECT descendant_id
    FROM `hostclass_hierarchies`
    WHERE ancestor_id = 214
       OR descendant_id = 214
  ) AS x )

D, [2018-03-09T21:57:01.793990 #148] DEBUG -- :    (0.2ms)  BEGIN
D, [2018-03-09T21:57:52.175943 #148] DEBUG -- :   SQL (50381.2ms)  INSERT INTO `hostclass_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (214, 214, 0)
D, [2018-03-09T21:57:52.176687 #148] DEBUG -- :    (0.5ms)  ROLLBACK
D, [2018-03-09T21:57:52.177324 #148] DEBUG -- :    (0.4ms)  SELECT RELEASE_LOCK('3bd772cb25a662070242f7efd5ca79ef2') AS t72718680058c1a95838500647177a561
D, [2018-03-09T21:57:52.177853 #148] DEBUG -- :    (0.4ms)  ROLLBACK
/idw/vendor/bundle/ruby/2.4.0/gems/mysql2-0.4.10/lib/mysql2/client.rb:120:in `_query': Mysql2::Error: Lock wait timeout exceeded; try restarting transaction: INSERT INTO `hostclass_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (214, 214, 0) (ActiveRecord::StatementInvalid)

...but provided some clues as to what was going on.

  • BEGIN transaction 1
    • UPDATE the node with its new parent
    • SELECT GET_LOCK stuff...
    • DELETE the old hierarchy information
    • BEGIN transaction 2
      • try to INSERT the new hierarchy information, but times out
    • ROLLBACK
    • SELECT RELEASE_LOCK
  • ROLLBACK

The (currently) third solution to https://stackoverflow.com/questions/6000336/how-to-debug-lock-wait-timeout-exceeded-on-mysql mentions that InnoDB's default isolation level is REPEATABLE_READ, whereas other DBs use READ_COMMITTED.

Setting my database's default isolation level to READ_COMMITTED...

SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

...did the trick and allowed the reparenting script (and another more complicated script) to work as anticipated. Not sure I'd want to leave this on globally, might be best to set it as tightly around the code as possible (before that first transaction). Maybe something like...

FooModel.transaction(isolation: :read_committed) do
  # the reparenting stuff
end

I'm not confident this solves the other two-thread deadlock issue, as that appears to be happening with non-MySQL/MariaDB databases, also.

@mceachen
Copy link
Collaborator

I hadn't thought about this before @tomalok 's post, but I believe with_advisory_lock could no-op if it could reliably determine that the current database is configured as READ COMMITTED or supports MVCC. It'd be good to get a simple breaking test in place to verify this hypothesis.

@kawadhiya21
Copy link

we are facing the same issue. Any update on this?

@jkraemer
Copy link
Contributor

I can reliably reproduce this deadlock with the following test case, which concurrently removes children from a nested set with numeric sort order. The ordering seems to be the culprit, because it tries to update siblings' sort indices which leads to an implicit row lock, which, in conjunction with the implicit row locks taken out when rails updates the parent_id on records (which happens before the advisory_lock is attempted). So a possible scenario would be:

  • A updates child1.parent_id (and now holds an implicit lock on child1)
  • B updates child2.parent_id (implicit lock on child2)
  • A wins the race and gets the advisory lock
  • B now waits for the advisory lock
  • A tries to update siblings' sort column, among them child2.
  • A waits to get the lock on child2 for the update.

This kind of deadlock where one writer waits for the advisory lock while the other waits for a row level lock apparently isn't detected by either MySQL or Postgres, leading to an endless wait.

The only way I found to fix this is to get the advisory lock before even updating the parent_id, i.e. by calling record._ct.with_advisory_lock { record.save } in your application code.

test case:

  it "fails to deadlock from removing siblings with sort_order" do
    skip("unsupported") unless run_parallel_tests?

    @root = Label.create!(name: 'root')
    60.times{ |i| Label.create! name: i, mother_id: @root.id }
    @root.reload

    # remove 40 random children in parallel threads
    descendant_ids = @root.descendant_ids
    ids_to_remove = descendant_ids.sample(40).shuffle
    ids_to_keep = descendant_ids - ids_to_remove
    Parallel.map(ids_to_remove, in_threads: max_threads) do |id|
      ActiveRecord::Base.connection_pool.with_connection do
        Label.find(id).update(mother_id: nil)
      end
    end

    Label.where(id: ids_to_remove).each do |l|
      assert_nil l.mother_id
      assert l.root?
      assert l.leaf?
    end
    @root.reload
    assert_equal 20, @root.children.count
    assert order = @root.children.pluck(:column_whereby_ordering_is_inferred).uniq
    assert_equal (0..19).to_a, order
  end

The test will get stuck in an endless loop, unless the call to ct_reorder_prior_siblings_if_parent_changed in the rebuild! method is commented out. It might worth thinking about the necessity of that reordering - maybe having non-consecutive sort indices isn't even an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants