-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
504 Gateway Timeout - The server didn't respond in time #314
Comments
I have the same issue |
Hey @devilteo911 - have you tried not setting a timeout and seeing if there's an issue on the server side regardless? Trying to narrow down if some information is not passing all the way through to the server or if there is an error on the server side. Thanks! |
Hey @ParthSareen, The issue seems to occur only on the first call, which consistently results in a 504 error. Subsequent calls with the same input perform the generation without any problems. I believe the problem is related to the time it takes to generate the first token, particularly during a cold start of my service. During a cold start, the model needs to be downloaded from Hugging Face, as my serverless GPU provider lacks permanent storage to keep the model locally. I hope this clarifies the issue. |
I experienced the issue intermittently when I looped it like this:
Is there a way to increase the timeout?
|
I don't know why but I'm encountering this problem with the library. Here I show my simple script:
Where
llm_config["base_url"]
is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is:qwen2.5:32b-instruct-q4_K_M
and the GPU is a RTX A6000.The traceback (client-side) is the following:
and this is what I see on the server side:
It happens everytime after 50 seconds even if the timeout is 600 seconds. Am I missing something?
The text was updated successfully, but these errors were encountered: