You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When there is a problem in the backend, the connection to the server is lost, but the service (webui) is still running.
So the pod is still in the running state, and cannot be detected that there is a problem and need at least to be restarted.
Example:
due to an internal timeout error:
raise WorkflowTimeoutError(msg)
llama_index.core.workflow.errors.WorkflowTimeoutError: Operation timed out after 240 seconds
an exception was thrown in the ASGI application and then raising KeyError: 'Session is disconnected'.
Because of that the server was disonnected, and the application cannot be used anymore and needed a manual restart in order to operate again.
Describe the solution you'd like
It would be good to have an internal process/ way of checking the health of the server and if it's still connected.
I have tried it with the already implemented health check using an api call, but it's returning 200 since the WebUI is working, but not the server ...
the idea would be either to implement an additional endpoint to return the status of the server, or for example some way of checking the server from the app itself using an async process running in parallel to the app.
Additional context
Would be necessary for any deployments using chainlit to avoid downtime.
The text was updated successfully, but these errors were encountered:
Currently, there isn't specific information available about a built-in mechanism in Chainlit for checking the backend server's health independently from the WebUI. However, implementing a custom health check endpoint or an asynchronous process to monitor the server's status could be a viable solution.
You could consider creating an additional endpoint that performs specific checks on the backend server's functionality, such as verifying database connections or other critical services, and returns a status code based on these checks. Alternatively, implementing an asynchronous task within the application that periodically checks the server's health and logs or alerts when issues are detected could help in identifying problems early.
These approaches would help in detecting backend issues even when the WebUI is still operational, thus allowing for more proactive management of server health and reducing downtime.
I am guessing you are managing multiple instances of your chainlit app? In the latest release, I fixed some issues with sticky sessions and auto scaling. This documentation might also help https://docs.chainlit.io/deploy/overview#tips-and-tricks
Is your feature request related to a problem? Please describe.
When there is a problem in the backend, the connection to the server is lost, but the service (webui) is still running.
So the pod is still in the running state, and cannot be detected that there is a problem and need at least to be restarted.
Example:
due to an internal timeout error:
raise WorkflowTimeoutError(msg)
llama_index.core.workflow.errors.WorkflowTimeoutError: Operation timed out after 240 seconds
an exception was thrown in the ASGI application and then raising KeyError: 'Session is disconnected'.
Because of that the server was disonnected, and the application cannot be used anymore and needed a manual restart in order to operate again.
Describe the solution you'd like
It would be good to have an internal process/ way of checking the health of the server and if it's still connected.
I have tried it with the already implemented health check using an api call, but it's returning 200 since the WebUI is working, but not the server ...
the idea would be either to implement an additional endpoint to return the status of the server, or for example some way of checking the server from the app itself using an async process running in parallel to the app.
Additional context
Would be necessary for any deployments using chainlit to avoid downtime.
The text was updated successfully, but these errors were encountered: