Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SystemExit gracefully in two phase commit mode #21

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sjustas
Copy link

@sjustas sjustas commented Dec 23, 2020

If SystemExit was raised while PJDataManager was joining a transaction in two-phase-commit mode, PJDataManager failed to clean up after itself.

Traceback (most recent call last):
  File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib/python3.8/unittest/case.py", line 679, in run
    self._callTearDown()
  File "/usr/lib/python3.8/unittest/case.py", line 636, in _callTearDown
    self.tearDown()
  File "/home/justas/src/shoobx/pjpersist/src/pjpersist/tests/test_datamanager.py", line 1649, in tearDown
    super(TwoPhaseCommitTestCase, self).tearDown()
  File "/home/justas/src/shoobx/pjpersist/src/pjpersist/testing.py", line 220, in tearDown
    transaction.abort()
  File "/home/justas/src/shoobx/pjpersist/ve/lib/python3.8/site-packages/transaction/_manager.py", line 260, in abort
    return self.manager.abort()
  File "/home/justas/src/shoobx/pjpersist/ve/lib/python3.8/site-packages/transaction/_manager.py", line 139, in abort
    return self.get().abort()
  File "/home/justas/src/shoobx/pjpersist/ve/lib/python3.8/site-packages/transaction/_transaction.py", line 570, in abort
    reraise(t, v, tb)
  File "/home/justas/src/shoobx/pjpersist/ve/lib/python3.8/site-packages/transaction/_compat.py", line 49, in reraise
    raise value
  File "/home/justas/src/shoobx/pjpersist/ve/lib/python3.8/site-packages/transaction/_transaction.py", line 551, in abort
    rm.abort(self)
  File "/home/justas/src/shoobx/pjpersist/src/pjpersist/datamanager.py", line 790, in abort
    self._conn.rollback()
psycopg2.ProgrammingError: rollback cannot be used during a two-phase transaction

@coveralls
Copy link

coveralls commented Dec 23, 2020

Coverage Status

Coverage increased (+0.09%) to 86.161% when pulling 4e6b59d on handle_system_exit into 2eb7aef on master.

self._tpc_activated = True
finally:
if self._conn.status == psycopg2.extensions.STATUS_BEGIN:
self._tpc_activated = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this have to do with SystemExit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SystemExit is raised in just the right time, _tpc_activated flag was not being set (because of exception) even though tpc_begin had just completed. So when PJDataManager tried to clean up, it called _conn.rollback() - this is invalid call for a connection in status STATUS_BEGIN. And so it failed with ProgrammingError

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemExit is just an exception we encountered in production, when a celery worker is being shut down while still trying to process some database task.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean you think this happens when SystemExit is raised right between tpc_begin() and _tpc_activated = True?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this makes sense. but the code needs in addition:

  1. Comment, explaining why this code is there
  2. Logging message on warning level when we don't set _tpc_activated to True.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kedder re-arranged the code a bit and added logging. Hope it reads better now.

self._tpc_activated = True
finally:
if self._conn.status == psycopg2.extensions.STATUS_BEGIN:
self._tpc_activated = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this makes sense. but the code needs in addition:

  1. Comment, explaining why this code is there
  2. Logging message on warning level when we don't set _tpc_activated to True.

Copy link
Member

@kedder kedder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

"PJDataManager._begin: tried to call connection.tpc_begin,"
" but it did not change connection status to STATUS_BEGIN,"
" failing with a generic exception %s."
" Not setting TPC mode on PJDataManager." % exc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: , exc please instead of %

…ining a transaction in two-phase-commit mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants