fix: improve ACP connection reliability with spawn retry and auto-reconnect#2804
fix: improve ACP connection reliability with spawn retry and auto-reconnect#2804tanzhenxin merged 1 commit intoQwenLM:mainfrom
Conversation
📋 Review SummaryThis PR improves ACP connection reliability by implementing a two-layer retry mechanism: spawn-level retry with exponential backoff in 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
TLDR
Improve ACP connection reliability by adding spawn retry logic and automatic reconnection when the ACP process dies unexpectedly.
Screenshots / Video Demo
N/A — no user-facing UI change. This is a resilience improvement for the ACP connection layer. When the ACP process crashes, users will now see an automatic reconnection attempt instead of a silent failure, and a clear error message if reconnection fails.
Dive Deeper
This PR addresses connection reliability issues where the ACP process could fail to start (e.g., SIGTERM during the 1-second startup grace period) or die unexpectedly during a session.
Three key changes:
Spawn retry with exponential backoff (
qwenConnectionHandler.ts):connection.connect()now retries up to 3 times with exponential backoff (1s, 2s, 4s) on transient spawn failures, instead of failing immediately.Disconnect event propagation (
qwenAgentManager.ts+chatTypes.ts): AddedonDisconnectedcallback that fires when the ACP child process exits unexpectedly. This event propagates from the connection layer throughQwenAgentManagertoWebViewProvider.Auto-reconnect on unexpected disconnect (
WebViewProvider.ts): When an initialized agent disconnects unexpectedly,WebViewProviderautomatically attempts to re-establish the connection (up to 3 attempts with exponential backoff). If all attempts fail, a user-facing error message is shown prompting them to use the refresh button.Reviewer Test Plan
npm run testinpackages/vscode-ide-companion— new tests cover the retry logic.kill <pid>), and verify:agentConnectionErrormessage appears in the webviewTesting Matrix
Linked issues / bugs
Reverts the revert in #2792 (which reverted #2666), reimplementing the reconnect logic with a more robust approach that includes spawn-level retry.