-
Notifications
You must be signed in to change notification settings - Fork 123
Open
Description
I'm noticing the following behavior in some cases when a redis failover has happened in the cluster.
The /metrics endpoint fails:
[2025-08-18 04:20:38,475] ERROR in app: Exception on /metrics [GET]
Traceback (most recent call last):
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 644, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/resp2.py", line 15, in read_response
result = self._read_response(disable_decoding=disable_decoding)
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/resp2.py", line 25, in _read_response
raw = self._buffer.readline()
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/socket.py", line 115, in readline
self._read_from_socket()
~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/redis/_parsers/socket.py", line 65, in _read_from_socket
data = self._sock.recv(socket_read_size)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 1511, in wsgi_app
response = self.full_dispatch_request()
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 919, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 917, in full_dispatch_request
rv = self.dispatch_request()
File "/app/.venv/lib/python3.13/site-packages/flask/app.py", line 902, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/app/src/http_server.py", line 32, in metrics
current_app.config["metrics_puller"]()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/src/exporter.py", line 156, in scrape
self.track_queue_metrics()
~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/src/exporter.py", line 238, in track_queue_metrics
for worker, stats in (self.app.control.inspect().stats() or {}).items()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/celery/app/control.py", line 243, in stats
return self._request('stats')
~~~~~~~~~~~~~^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/celery/app/control.py", line 106, in _request
return self._prepare(self.app.control.broadcast(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
command,
^^^^^^^^
...<6 lines>...
pattern=self.pattern, matcher=self.matcher,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
))
^
File "/app/.venv/lib/python3.13/site-packages/celery/app/control.py", line 777, in broadcast
return self.mailbox(conn)._broadcast(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
command, arguments, destination, reply, timeout,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
limit, callback, channel=channel,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/app/.venv/lib/python3.13/site-packages/kombu/pidbox.py", line 337, in _broadcast
self._publish(command, arguments, destination=destination,
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
reply_ticket=reply_ticket,
^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
pattern=pattern,
^^^^^^^^^^^^^^^^
matcher=matcher)
^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/pidbox.py", line 299, in _publish
maybe_declare(self.reply_queue(chan))
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/common.py", line 113, in maybe_declare
return _maybe_declare(entity, channel)
File "/app/.venv/lib/python3.13/site-packages/kombu/common.py", line 155, in _maybe_declare
entity.declare(channel=channel)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/entity.py", line 617, in declare
self._create_queue(nowait=nowait, channel=channel)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/entity.py", line 626, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/entity.py", line 655, in queue_declare
ret = channel.queue_declare(
queue=self.name,
...<5 lines>...
nowait=nowait,
)
File "/app/.venv/lib/python3.13/site-packages/kombu/transport/virtual/base.py", line 538, in queue_declare
return queue_declare_ok_t(queue, self._size(queue), 0)
~~~~~~~~~~^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/kombu/transport/redis.py", line 1012, in _size
sizes = pipe.execute()
File "/app/.venv/lib/python3.13/site-packages/redis/client.py", line 1613, in execute
return conn.retry.call_with_retry(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
lambda: execute(conn, stack, raise_on_error),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lambda error: self._disconnect_raise_on_watching(conn, error),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 92, in call_with_retry
raise error
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 87, in call_with_retry
return do()
File "/app/.venv/lib/python3.13/site-packages/redis/client.py", line 1614, in <lambda>
lambda: execute(conn, stack, raise_on_error),
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/redis/client.py", line 1455, in _execute_transaction
connection.send_packed_command(all_cmds)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 581, in send_packed_command
self.check_health()
~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 573, in check_health
self.retry.call_with_retry(self._send_ping, self._ping_failed)
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 92, in call_with_retry
raise error
File "/app/.venv/lib/python3.13/site-packages/redis/retry.py", line 87, in call_with_retry
return do()
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 563, in _send_ping
if str_if_bytes(self.read_response()) != "PONG":
~~~~~~~~~~~~~~~~~~^^
File "/app/.venv/lib/python3.13/site-packages/redis/connection.py", line 652, in read_response
raise ConnectionError(f"Error while reading from {host_error} : {e.args}")
redis.exceptions.ConnectionError: Error while reading from <my-redis-service>:6379 : (113, 'No route to host')
While at the same time, the liveness probe on /health still returns a 200:
$ curl -s 127.0.0.1:9808/health
Connected to the broker redis://<my-redis-service>:6379//
Restarting the pod manually fixes the issue.
Could we change this behavior? Maybe the process should exit instead of returning only an ERROR log?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels