-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: timing out causes the agent to stop #5601
Comments
I'll add here for the record: I am unable to reproduce in some normal way this behavior on the local installation (with local docker), it works just fine. I can see it only on the hosted version. We use many timeouts in the code, and in this case, I've looked at 3 of them:
All these have by default the value of |
I have a similar issue: https://www.all-hands.dev/share?share_id=fcba85aaadd94a7557d6b6d0283ef50fd27c33ec0293d1df4e4f41f84bf86588 |
It would still be great to get this fixed. |
I'm pretty sure that the issue isn't the timeout, but rather because the
agent runs into an error and can't recover from it, so it gives up after
120 seconds, which is in fact an ideal behavior so that it wastes no
unnecessary computational power
…On Tue, 31 Dec 2024, 0:01 Graham Neubig, ***@***.***> wrote:
It would still be great to get this fixed.
—
Reply to this email directly, view it on GitHub
<#5601 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRACFALWW6KAS6NW3N5WQD2IG7B3AVCNFSM6AAAAABTTZQNBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVHE2TKMZYGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The problem here is that the underlying runtime dies (e.g. due to running out of memory) which leaves the HTTP client in the lurch. The HTTP request times out, and we get this error. It's not an easy fix unfortunately. We could probably add an API to check how many times the runtime has rebooted, and send the user a message like "Runtime rebooted, potentially due to memory usage. Please try again." |
OK, sounds good. I confirmed that if I just run a command that times out (sleep 120) I get the expected message. Separately, this is happening when I run OpenHands unit tests according to our standard unit testing github workflow. A combination of:
Should make this significantly better. |
An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-5601' has been created with the attempted changes. You can view the branch here. Manual intervention may be required. Additional details about the failure:
The AI agent's last message seems to be describing an ideal solution but doesn't reflect the actual current state of the fix. The thread indicates this is still an ongoing issue that requires additional work, particularly around handling runtime failures and providing better error messages to users. |
This ensures that all requests go through the proper error handling path, including the 502 error handling that converts the error to a more helpful AgentRuntimeDisconnectedError message. Fixes #5601
Is there an existing issue for the same bug?
Describe the bug and reproduction steps
Currently, when the agent times out after 120 seconds of a program running, the state changes to "agent has encountered an error" and you need to send a message to the agent to ask it to keep going.
Better behavior would be that the agent gets a message that the command timed out but does not stop (this was the behavior in previous versions of Open Hands).
OpenHands Installation
app.all-hands.dev
OpenHands Version
No response
Operating System
None
Logs, Errors, Screenshots, and Additional Context
No response
The text was updated successfully, but these errors were encountered: