-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
pipecat version
0.0.100
Python version
3.12
Operating System
macOS 26.2
Issue description
Bug
WebsocketService._try_reconnect enters an infinite loop when the server accepts the WebSocket handshake but immediately closes the connection with an error (e.g. 1008 policy violation - Invalid API key).
What happens
The reconnection succeeds (handshake + ping verification pass), so _try_reconnect returns True on attempt 1. The _receive_task_handler loop resumes, _receive_messages runs, the server immediately kicks the connection with a ConnectionClosedError, and the cycle repeats — forever.
reconnected successfully on attempt 1
connection closed, but with an error: received 1008 (policy violation) Invalid API key
reconnecting, attempt 1
reconnected successfully on attempt 1
connection closed, but with an error: received 1008 (policy violation) Invalid API key
reconnecting, attempt 1
... (infinite)
Why it happens
_try_reconnectonly counts failed handshakes toward its 3-attempt limit. If the handshake succeeds, the counter resets._verify_connectiononly checkswebsocket.ping()— which succeeds in the brief window before the server sends its close frame.- No
ErrorFrameis ever emitted — neither non-fatal (only sent on failed reconnect attempts) nor fatal (only sent after 3 exhausted attempts). Theon_pipeline_errorhandler is never called.
Impact
- The service silently burns CPU reconnecting ~1x/second indefinitely
- No
ErrorFrame(fatal or non-fatal) is ever propagated to the pipeline - Downstream consumers (e.g.
on_pipeline_errorhandlers) have no way to detect or react to this state - Affects all services inheriting from
WebsocketService(ElevenLabs TTS, etc.)
Suggested fixes
Any of the following would address this:
- Track connection stability: If a connection lasts less than N seconds before being closed, count it as a failed attempt toward the retry limit.
- Respect non-recoverable close codes: Close codes like
1008(policy violation),1003(unsupported data),4xxx(application-level) should not trigger reconnection — they indicate permanent errors. - Global cycle detection: Track the number of reconnection cycles within a time window (e.g. 5 reconnects in 30 seconds = fatal), regardless of whether individual attempts "succeed."
- Affected services: ElevenLabs TTS, (any
WebsocketServicesubclass)
Reproduction steps
- Provide wrong token to one of the services (TTS...)
Expected behavior
After a small number of failed reconnection cycles, WebsocketService should emit a fatal ErrorFrame and stop retrying — allowing the pipeline's on_pipeline_error handler to detect the failure and take action (e.g. redirect to a human agent).
Actual behavior
The service enters an infinite reconnection loop (~1 reconnect/second), never emits any ErrorFrame, and the pipeline has no way to detect or react to the failure. The loop continues until the process is manually killed.