-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Deployment worker does not reset status to error on failure — services stuck permanently in running state #4083
Description
Description
In apps/dokploy/server/queues/deployments-queue.ts, the BullMQ worker wraps all deployment logic in a try/catch, but the catch block only logs the error and does nothing to recover the service state:
async (job: Job<DeploymentJob>) => {
try {
if (job.data.applicationType === "application") {
await updateApplicationStatus(job.data.applicationId, "running");
// ... deploy or rebuild
} else if (job.data.applicationType === "compose") {
await updateCompose(job.data.composeId, { composeStatus: "running" });
// ... deploy or rebuild
}
} catch (error) {
console.log("Error", error); // No status rollback
}
}The status is set to "running" at the start. If the deployment throws (network failure, Docker daemon crash, SSH timeout), the status is never reset.
The queue is also configured with removeOnComplete: true and removeOnFail: true, so there is no evidence left for debugging.
Impact
Any deployment that fails inside the worker leaves the service in a ghost "running" state. Users see a perpetual "deploying" indicator with no way to recover short of manually updating the database. Especially dangerous for automated webhook-triggered deployments where nobody is watching the UI.
Fix
Add status rollback in the catch block:
catch (error) {
console.log("Error", error);
if (job.data.applicationType === "application") {
await updateApplicationStatus(job.data.applicationId, "error");
} else if (job.data.applicationType === "compose") {
await updateCompose(job.data.composeId, { composeStatus: "error" });
} else if (job.data.applicationType === "application-preview") {
await updatePreviewDeployment(job.data.previewDeploymentId, { previewStatus: "error" });
}
}