Make populate status truthful#85
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Disabled knowledge base sources:
📝 WalkthroughWalkthroughThis PR refactors the dataset populate workflow from synchronous to asynchronous initialization. The backend now claims populate runs via Convex, returns HTTP 202 with a runId, and starts populate execution in a background worker that sets dataset status to live or failed (persisting a truncated error). The populate-agent step now propagates agent errors. Frontend changes add a "failed" dataset status, surface lastStatusError, block Update/Populate while building, derive button labels/states from status, and emit a DATASET_POPULATE_STARTED event including runId. Sequence DiagramsequenceDiagram
participant Client
participant PopulateRoute as /populate route
participant BeginPopulate as beginDatasetPopulate
participant ConvexDB as Convex DB
participant Background as runPopulateWorkflowInBackground
participant PopulateWF as Populate Workflow
Client->>PopulateRoute: POST /populate { datasetId }
PopulateRoute->>BeginPopulate: claim populate for dataset
BeginPopulate->>ConvexDB: validate ownership & check if building
alt not found / forbidden / already building
ConvexDB-->>BeginPopulate: error outcome
BeginPopulate-->>PopulateRoute: claim failed
PopulateRoute-->>Client: HTTP 404/403/409
else started
BeginPopulate->>ConvexDB: patch dataset to building (clear lastStatusError)
BeginPopulate-->>PopulateRoute: started (runId)
PopulateRoute->>Background: start async populate (fire & forget) with runId
PopulateRoute-->>Client: HTTP 202 { success: true, runId }
Background->>PopulateWF: execute populate workflow (server auth/run context)
PopulateWF-->>Background: completes or error
Background->>ConvexDB: patch dataset to live (on success) or failed (on error, with truncated lastStatusError)
Background->>ConvexDB: send dataset-ready email + analytics (best-effort)
end
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
d08398e to
3201d21
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/src/index.ts (1)
286-319: 💤 Low valueConsider moving
createRun()after the claim succeeds.The workflow run is created at line 292 before calling
beginDatasetPopulate. If the claim fails (not_found, forbidden, already_building), the run object is discarded unused. MovingcreateRun()after line 309 would avoid allocating resources for requests that will be rejected.♻️ Suggested reordering
try { const auth = req.auth; if (!auth) { return reply.code(401).send({ error: "Authentication required" }); } - const run = await populateWorkflow.createRun(); const populateOutcome = await beginDatasetPopulate( parsed.data.datasetId, auth.userId, ); if (populateOutcome === "not_found") { return reply.code(404).send({ error: "Dataset not found" }); } if (populateOutcome === "forbidden") { return reply.code(403).send({ error: "Not authorized to populate this dataset" }); } if (populateOutcome === "already_building") { return reply.code(409).send({ error: "Dataset is already being populated" }); } if (populateOutcome !== "started") { throw new Error(`Unexpected populate claim outcome: ${populateOutcome}`); } + const run = await populateWorkflow.createRun(); + void runPopulateWorkflowInBackground({🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/index.ts` around lines 286 - 319, The code creates a workflow run (populateWorkflow.createRun()) before attempting to claim the dataset (beginDatasetPopulate), causing unused runs when the claim fails; move the createRun() call to after you verify populateOutcome === "started" (i.e., after the not_found/forbidden/already_building checks) so you only allocate a run when the claim succeeds, then pass that run into runPopulateWorkflowInBackground along with parsed.data, authorizedUserId, logger and clerk as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@backend/src/index.ts`:
- Around line 286-319: The code creates a workflow run
(populateWorkflow.createRun()) before attempting to claim the dataset
(beginDatasetPopulate), causing unused runs when the claim fails; move the
createRun() call to after you verify populateOutcome === "started" (i.e., after
the not_found/forbidden/already_building checks) so you only allocate a run when
the claim succeeds, then pass that run into runPopulateWorkflowInBackground
along with parsed.data, authorizedUserId, logger and clerk as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f303bfec-d14d-4837-9bba-9d29025f3554
📒 Files selected for processing (10)
backend/src/index.tsbackend/src/mastra/workflows/populate.tsfrontend/app/dataset/[id]/page.tsxfrontend/app/dataset/new/page.tsxfrontend/components/dataset/StatusBadge.tsxfrontend/components/table/types.tsfrontend/convex/datasets.tsfrontend/convex/schema.tsfrontend/lib/analytics.tsfrontend/lib/backend.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/index.ts`:
- Around line 310-311: Wrap the populateWorkflow.createRun() call in a try/catch
so that if createRun() throws you perform the cleanup that
beginDatasetPopulate() established: release the dataset claim and mark the
dataset as failed (or otherwise transition it out of "building") before
returning the 502; specifically, after calling beginDatasetPopulate() (which
claimed the dataset) ensure you either invoke the cleanup/rollback function
returned by beginDatasetPopulate() or call the appropriate dataset
release/markFailed method on your dataset repository/service so the claim is
cleared and the dataset status is updated, then propagate the 502 response.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 98ef45b3-03a0-4ee6-b46a-8db86ba0a063
📒 Files selected for processing (1)
backend/src/index.ts
… status (#93) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #85 ("Make populate status truthful") refactored the /populate route from synchronous (awaiting the workflow inline) to async: it now claims the dataset, creates a run, fires runPopulateWorkflowInBackground() as a background void promise, and returns 202 immediately. Our branch still had the old synchronous pattern with the post-workflow notification and status-update logic inline in the route handler — including our deleteIncomplete pruning addition. The conflict was the entire old inline block vs main's single 202 return. Resolution: - Take main's 202 return — the async pattern is strictly better. - Move the deleteIncomplete pruning into runPopulateWorkflowInBackground() (already introduced by main), placed before the countByDataset call so we count only complete rows when deciding whether to set status "live". All other files (populate.ts workflow, frontend status/schema changes) merged cleanly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Rebased onto main after #84 merged.
Fixes #79.
Fixes #69.
Verification