Skip to content

Make populate status truthful#85

Merged
simantak-dabhade merged 5 commits into
mainfrom
codex/status-truth-populate-pr1
May 25, 2026
Merged

Make populate status truthful#85
simantak-dabhade merged 5 commits into
mainfrom
codex/status-truth-populate-pr1

Conversation

@giaphutran12
Copy link
Copy Markdown
Collaborator

@giaphutran12 giaphutran12 commented May 25, 2026

Summary

  • Make /populate return 202 with a runId and continue the Mastra workflow in the background
  • Move dataset lifecycle status to backend-owned transitions: building before run, live only after rows exist, failed with lastStatusError on failure
  • Stop frontend optimistic completion: status badges/buttons now reflect Convex status, and failed state surfaces the backend error
  • Prevent duplicate in-flight /populate requests for the same dataset with an atomic Convex claim; duplicates now return 409 instead of launching another row-clearing workflow

Rebased onto main after #84 merged.

Fixes #79.
Fixes #69.

Verification

  • npm run build (backend)
  • npm run lint (frontend; passed with existing warnings)
  • npm run build (frontend, with non-secret placeholder public env)
  • npm test --if-present (backend)
  • npm test --if-present (frontend)
  • git diff --check
  • make convex-push: blocked in isolated worktree because no local .env/admin key is available; no secret file was inspected or copied
  • Earlier PR runtime QA on isolated stack: seeded public datasets, forced one to failed and one to building via backend Convex mutation, verified frontend showed FAILED + backend error and Building... with disabled workflow buttons

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3b9175e-fddd-40d0-b5ee-e4a0b67f5d94

📥 Commits

Reviewing files that changed from the base of the PR and between 9cc0895 and ea41ab0.

📒 Files selected for processing (1)
  • backend/src/index.ts

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


📝 Walkthrough

Walkthrough

This PR refactors the dataset populate workflow from synchronous to asynchronous initialization. The backend now claims populate runs via Convex, returns HTTP 202 with a runId, and starts populate execution in a background worker that sets dataset status to live or failed (persisting a truncated error). The populate-agent step now propagates agent errors. Frontend changes add a "failed" dataset status, surface lastStatusError, block Update/Populate while building, derive button labels/states from status, and emit a DATASET_POPULATE_STARTED event including runId.

Sequence Diagram

sequenceDiagram
  participant Client
  participant PopulateRoute as /populate route
  participant BeginPopulate as beginDatasetPopulate
  participant ConvexDB as Convex DB
  participant Background as runPopulateWorkflowInBackground
  participant PopulateWF as Populate Workflow

  Client->>PopulateRoute: POST /populate { datasetId }
  PopulateRoute->>BeginPopulate: claim populate for dataset
  BeginPopulate->>ConvexDB: validate ownership & check if building
  alt not found / forbidden / already building
    ConvexDB-->>BeginPopulate: error outcome
    BeginPopulate-->>PopulateRoute: claim failed
    PopulateRoute-->>Client: HTTP 404/403/409
  else started
    BeginPopulate->>ConvexDB: patch dataset to building (clear lastStatusError)
    BeginPopulate-->>PopulateRoute: started (runId)
    PopulateRoute->>Background: start async populate (fire & forget) with runId
    PopulateRoute-->>Client: HTTP 202 { success: true, runId }
    Background->>PopulateWF: execute populate workflow (server auth/run context)
    PopulateWF-->>Background: completes or error
    Background->>ConvexDB: patch dataset to live (on success) or failed (on error, with truncated lastStatusError)
    Background->>ConvexDB: send dataset-ready email + analytics (best-effort)
  end
Loading

Possibly related PRs

Suggested reviewers

  • simantak-dabhade
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title "Make populate status truthful" clearly summarizes the main objective of moving dataset status transitions to backend-owned logic and making status reflect actual workflow state.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of the changes (async populate, backend-owned status transitions, failed state support, concurrency prevention) with verification steps performed.
Linked Issues check ✅ Passed The PR addresses both linked issues: #79 (wiring backend logic to update status chips during populate) and #69 (making populate async, driving status from workflow, adding failed/lastStatusError states, preventing duplicates with concurrency gates).
Out of Scope Changes check ✅ Passed All changes are directly scoped to the PR objectives: backend async populate, Convex status transitions, frontend UI updates to reflect backend status, and new analytics event for populate started.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/status-truth-populate-pr1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@giaphutran12 giaphutran12 self-assigned this May 25, 2026
@giaphutran12 giaphutran12 marked this pull request as draft May 25, 2026 05:54
Base automatically changed from codex/root-env-unification-pr0 to main May 25, 2026 09:37
@giaphutran12 giaphutran12 force-pushed the codex/status-truth-populate-pr1 branch from d08398e to 3201d21 Compare May 25, 2026 09:50
@giaphutran12 giaphutran12 marked this pull request as ready for review May 25, 2026 09:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/src/index.ts (1)

286-319: 💤 Low value

Consider moving createRun() after the claim succeeds.

The workflow run is created at line 292 before calling beginDatasetPopulate. If the claim fails (not_found, forbidden, already_building), the run object is discarded unused. Moving createRun() after line 309 would avoid allocating resources for requests that will be rejected.

♻️ Suggested reordering
     try {
       const auth = req.auth;
       if (!auth) {
         return reply.code(401).send({ error: "Authentication required" });
       }

-      const run = await populateWorkflow.createRun();
       const populateOutcome = await beginDatasetPopulate(
         parsed.data.datasetId,
         auth.userId,
       );

       if (populateOutcome === "not_found") {
         return reply.code(404).send({ error: "Dataset not found" });
       }
       if (populateOutcome === "forbidden") {
         return reply.code(403).send({ error: "Not authorized to populate this dataset" });
       }
       if (populateOutcome === "already_building") {
         return reply.code(409).send({ error: "Dataset is already being populated" });
       }
       if (populateOutcome !== "started") {
         throw new Error(`Unexpected populate claim outcome: ${populateOutcome}`);
       }

+      const run = await populateWorkflow.createRun();
+
       void runPopulateWorkflowInBackground({
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/index.ts` around lines 286 - 319, The code creates a workflow run
(populateWorkflow.createRun()) before attempting to claim the dataset
(beginDatasetPopulate), causing unused runs when the claim fails; move the
createRun() call to after you verify populateOutcome === "started" (i.e., after
the not_found/forbidden/already_building checks) so you only allocate a run when
the claim succeeds, then pass that run into runPopulateWorkflowInBackground
along with parsed.data, authorizedUserId, logger and clerk as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/src/index.ts`:
- Around line 286-319: The code creates a workflow run
(populateWorkflow.createRun()) before attempting to claim the dataset
(beginDatasetPopulate), causing unused runs when the claim fails; move the
createRun() call to after you verify populateOutcome === "started" (i.e., after
the not_found/forbidden/already_building checks) so you only allocate a run when
the claim succeeds, then pass that run into runPopulateWorkflowInBackground
along with parsed.data, authorizedUserId, logger and clerk as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f303bfec-d14d-4837-9bba-9d29025f3554

📥 Commits

Reviewing files that changed from the base of the PR and between 1e9cc8b and 3201d21.

📒 Files selected for processing (10)
  • backend/src/index.ts
  • backend/src/mastra/workflows/populate.ts
  • frontend/app/dataset/[id]/page.tsx
  • frontend/app/dataset/new/page.tsx
  • frontend/components/dataset/StatusBadge.tsx
  • frontend/components/table/types.ts
  • frontend/convex/datasets.ts
  • frontend/convex/schema.ts
  • frontend/lib/analytics.ts
  • frontend/lib/backend.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/index.ts`:
- Around line 310-311: Wrap the populateWorkflow.createRun() call in a try/catch
so that if createRun() throws you perform the cleanup that
beginDatasetPopulate() established: release the dataset claim and mark the
dataset as failed (or otherwise transition it out of "building") before
returning the 502; specifically, after calling beginDatasetPopulate() (which
claimed the dataset) ensure you either invoke the cleanup/rollback function
returned by beginDatasetPopulate() or call the appropriate dataset
release/markFailed method on your dataset repository/service so the claim is
cleared and the dataset status is updated, then propagate the 502 response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98ef45b3-03a0-4ee6-b46a-8db86ba0a063

📥 Commits

Reviewing files that changed from the base of the PR and between 3201d21 and 9cc0895.

📒 Files selected for processing (1)
  • backend/src/index.ts

Comment thread backend/src/index.ts Outdated
Copy link
Copy Markdown
Contributor

@simantak-dabhade simantak-dabhade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@simantak-dabhade simantak-dabhade merged commit 0a86186 into main May 25, 2026
2 checks passed
@simantak-dabhade simantak-dabhade deleted the codex/status-truth-populate-pr1 branch May 25, 2026 20:16
MMeteorL added a commit that referenced this pull request May 26, 2026
PR #85 ("Make populate status truthful") refactored the /populate route
from synchronous (awaiting the workflow inline) to async: it now claims
the dataset, creates a run, fires runPopulateWorkflowInBackground() as
a background void promise, and returns 202 immediately.

Our branch still had the old synchronous pattern with the post-workflow
notification and status-update logic inline in the route handler —
including our deleteIncomplete pruning addition. The conflict was the
entire old inline block vs main's single 202 return.

Resolution:
- Take main's 202 return — the async pattern is strictly better.
- Move the deleteIncomplete pruning into runPopulateWorkflowInBackground()
  (already introduced by main), placed before the countByDataset call so
  we count only complete rows when deciding whether to set status "live".

All other files (populate.ts workflow, frontend status/schema changes)
merged cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The Live/building status docent actually update. Async populate workflow with real status tracking

3 participants