[Bug]: timing out causes the agent to stop #5601

neubig · 2024-12-14T19:29:30Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Describe the bug and reproduction steps

Currently, when the agent times out after 120 seconds of a program running, the state changes to "agent has encountered an error" and you need to send a message to the agent to ask it to keep going.

Better behavior would be that the agent gets a message that the command timed out but does not stop (this was the behavior in previous versions of Open Hands).

OpenHands Installation

app.all-hands.dev

OpenHands Version

No response

Operating System

None

Logs, Errors, Screenshots, and Additional Context

No response

openhands-agent · 2024-12-14T19:29:49Z

OpenHands started fixing the issue! You can monitor the progress here.

enyst · 2024-12-18T11:51:46Z

Better behavior would be that the agent gets a message that the command timed out but does not stop (this was the behavior in previous versions of Open Hands).

I'll add here for the record: I am unable to reproduce in some normal way this behavior on the local installation (with local docker), it works just fine. I can see it only on the hosted version.

We use many timeouts in the code, and in this case, I've looked at 3 of them:

the runtime timeout here doesn't get hit on local install, the 5 extra seconds help. I suspect this is the one that gets things messy on the hosted version.
the bash command timeout around here gets hit, the agent gets the information that the command timed out, and continues normally
same for the ipython timeout here.

All these have by default the value of sandbox.timeout setting.

avi12 · 2024-12-30T18:27:28Z

I have a similar issue: https://www.all-hands.dev/share?share_id=fcba85aaadd94a7557d6b6d0283ef50fd27c33ec0293d1df4e4f41f84bf86588
LLM: GPT-4o

neubig · 2024-12-30T22:00:39Z

It would still be great to get this fixed.

avi12 · 2024-12-31T14:46:34Z

I'm pretty sure that the issue isn't the timeout, but rather because the agent runs into an error and can't recover from it, so it gives up after 120 seconds, which is in fact an ideal behavior so that it wastes no unnecessary computational power

…

On Tue, 31 Dec 2024, 0:01 Graham Neubig, ***@***.***> wrote: It would still be great to get this fixed. — Reply to this email directly, view it on GitHub <#5601 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABRACFALWW6KAS6NW3N5WQD2IG7B3AVCNFSM6AAAAABTTZQNBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVHE2TKMZYGY> . You are receiving this because you commented.Message ID: ***@***.***>

rbren · 2024-12-31T15:36:32Z

The problem here is that the underlying runtime dies (e.g. due to running out of memory) which leaves the HTTP client in the lurch. The HTTP request times out, and we get this error.

It's not an easy fix unfortunately. We could probably add an API to check how many times the runtime has rebooted, and send the user a message like "Runtime rebooted, potentially due to memory usage. Please try again."

neubig · 2024-12-31T20:59:14Z

OK, sounds good. I confirmed that if I just run a command that times out (sleep 120) I get the expected message.

Separately, this is happening when I run OpenHands unit tests according to our standard unit testing github workflow. A combination of:

A better error message (this issue) and
Configurable runtime size Add runtime size configuration feature #5805

Should make this significantly better.

openhands-agent · 2024-12-31T20:59:46Z

OpenHands started fixing the issue! You can monitor the progress here.

openhands-agent · 2024-12-31T21:09:23Z

An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-5601' has been created with the attempted changes. You can view the branch here. Manual intervention may be required.

Additional details about the failure:
While some progress has been made in understanding the root cause, the issue hasn't been fully resolved yet. From the thread discussion, it became clear that:

The original issue is more complex than just a timeout problem - it's related to the underlying runtime dying (e.g., due to memory issues) which causes the HTTP client to timeout.
The simple timeout case (like running sleep 120) works as expected, but the more complex cases involving runtime failures still need to be addressed.
A proposed solution was mentioned to add an API to check runtime reboot counts and provide better error messages like "Runtime rebooted, potentially due to memory usage. Please try again."
The issue is being addressed alongside another PR (Add runtime size configuration feature #5805) for configurable runtime size.

The AI agent's last message seems to be describing an ideal solution but doesn't reflect the actual current state of the fix. The thread indicates this is still an ongoing issue that requires additional work, particularly around handling runtime failures and providing better error messages to users.

This ensures that all requests go through the proper error handling path, including the 502 error handling that converts the error to a more helpful AgentRuntimeDisconnectedError message. Fixes #5601

neubig added bug Something isn't working fix-me Attempt to fix this issue with OpenHands labels Dec 14, 2024

enyst mentioned this issue Dec 18, 2024

Reset a failed tool call #5666

Merged

1 task

enyst mentioned this issue Dec 30, 2024

[Bug]: "Runtime failed to return execute_action" on command timeout #5927

Closed

1 task

neubig added fix-me Attempt to fix this issue with OpenHands and removed fix-me Attempt to fix this issue with OpenHands labels Dec 31, 2024

neubig linked a pull request Jan 1, 2025 that will close this issue

fix: Use _send_action_server_request in send_action_for_execution #5951

Open

neubig self-assigned this Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: timing out causes the agent to stop #5601

[Bug]: timing out causes the agent to stop #5601

neubig commented Dec 14, 2024

openhands-agent commented Dec 14, 2024

enyst commented Dec 18, 2024

avi12 commented Dec 30, 2024

neubig commented Dec 30, 2024

avi12 commented Dec 31, 2024 via email

rbren commented Dec 31, 2024

neubig commented Dec 31, 2024

openhands-agent commented Dec 31, 2024

openhands-agent commented Dec 31, 2024

[Bug]: timing out causes the agent to stop #5601

[Bug]: timing out causes the agent to stop #5601

Comments

neubig commented Dec 14, 2024

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

OpenHands Installation

OpenHands Version

Operating System

Logs, Errors, Screenshots, and Additional Context

openhands-agent commented Dec 14, 2024

enyst commented Dec 18, 2024

avi12 commented Dec 30, 2024

neubig commented Dec 30, 2024

avi12 commented Dec 31, 2024 via email

rbren commented Dec 31, 2024

neubig commented Dec 31, 2024

openhands-agent commented Dec 31, 2024

openhands-agent commented Dec 31, 2024