Skip to content

Commit 74aa478

Browse files
authored
Add log message when worker lock timeouts get large (#18124)
This is to help track down a possible, but very rare, worker deadlock that was seen on matrix.org. In theory, you could work back from an instance of these new logs to the approximate time when the lock was obtained and focus the diagnostic efforts there. ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
1 parent 816054b commit 74aa478

File tree

2 files changed

+10
-0
lines changed

2 files changed

+10
-0
lines changed

changelog.d/18124.misc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add log message when worker lock timeouts get large.

synapse/handlers/worker_lock.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
#
2020
#
2121

22+
import logging
2223
import random
2324
from types import TracebackType
2425
from typing import (
@@ -269,6 +270,10 @@ async def __aexit__(
269270
def _get_next_retry_interval(self) -> float:
270271
next = self._retry_interval
271272
self._retry_interval = max(5, next * 2)
273+
if self._retry_interval > 5 * 2 ^ 7: # ~10 minutes
274+
logging.warning(
275+
f"Lock timeout is getting excessive: {self._retry_interval}s. There may be a deadlock."
276+
)
272277
return next * random.uniform(0.9, 1.1)
273278

274279

@@ -344,4 +349,8 @@ async def __aexit__(
344349
def _get_next_retry_interval(self) -> float:
345350
next = self._retry_interval
346351
self._retry_interval = max(5, next * 2)
352+
if self._retry_interval > 5 * 2 ^ 7: # ~10 minutes
353+
logging.warning(
354+
f"Lock timeout is getting excessive: {self._retry_interval}s. There may be a deadlock."
355+
)
347356
return next * random.uniform(0.9, 1.1)

0 commit comments

Comments
 (0)