Skip to content

WebsocketService enters infinite reconnection loop when server accepts handshake but immediately closes connection #3711

@OmerCohenAviv

Description

@OmerCohenAviv

pipecat version

0.0.100

Python version

3.12

Operating System

macOS 26.2

Issue description

Bug

WebsocketService._try_reconnect enters an infinite loop when the server accepts the WebSocket handshake but immediately closes the connection with an error (e.g. 1008 policy violation - Invalid API key).

What happens

The reconnection succeeds (handshake + ping verification pass), so _try_reconnect returns True on attempt 1. The _receive_task_handler loop resumes, _receive_messages runs, the server immediately kicks the connection with a ConnectionClosedError, and the cycle repeats — forever.

reconnected successfully on attempt 1
connection closed, but with an error: received 1008 (policy violation) Invalid API key
reconnecting, attempt 1
reconnected successfully on attempt 1
connection closed, but with an error: received 1008 (policy violation) Invalid API key
reconnecting, attempt 1
... (infinite)

Why it happens

  1. _try_reconnect only counts failed handshakes toward its 3-attempt limit. If the handshake succeeds, the counter resets.
  2. _verify_connection only checks websocket.ping() — which succeeds in the brief window before the server sends its close frame.
  3. No ErrorFrame is ever emitted — neither non-fatal (only sent on failed reconnect attempts) nor fatal (only sent after 3 exhausted attempts). The on_pipeline_error handler is never called.

Impact

  • The service silently burns CPU reconnecting ~1x/second indefinitely
  • No ErrorFrame (fatal or non-fatal) is ever propagated to the pipeline
  • Downstream consumers (e.g. on_pipeline_error handlers) have no way to detect or react to this state
  • Affects all services inheriting from WebsocketService (ElevenLabs TTS, etc.)

Suggested fixes

Any of the following would address this:

  1. Track connection stability: If a connection lasts less than N seconds before being closed, count it as a failed attempt toward the retry limit.
  2. Respect non-recoverable close codes: Close codes like 1008 (policy violation), 1003 (unsupported data), 4xxx (application-level) should not trigger reconnection — they indicate permanent errors.
  3. Global cycle detection: Track the number of reconnection cycles within a time window (e.g. 5 reconnects in 30 seconds = fatal), regardless of whether individual attempts "succeed."
  • Affected services: ElevenLabs TTS, (any WebsocketService subclass)

Reproduction steps

  1. Provide wrong token to one of the services (TTS...)

Expected behavior

After a small number of failed reconnection cycles, WebsocketService should emit a fatal ErrorFrame and stop retrying — allowing the pipeline's on_pipeline_error handler to detect the failure and take action (e.g. redirect to a human agent).

Actual behavior

The service enters an infinite reconnection loop (~1 reconnect/second), never emits any ErrorFrame, and the pipeline has no way to detect or react to the failure. The loop continues until the process is manually killed.

Logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions