Skip to content

fix(etl): bulk-update Jan Aushadhi prices via atomic RPC#2001

Merged
dipexplorer merged 2 commits into
RatLoopz:mainfrom
shashank03-dev:fix/issue-1966-ja-price-bulk-rpc
Jun 17, 2026
Merged

fix(etl): bulk-update Jan Aushadhi prices via atomic RPC#2001
dipexplorer merged 2 commits into
RatLoopz:mainfrom
shashank03-dev:fix/issue-1966-ja-price-bulk-rpc

Conversation

@shashank03-dev

@shashank03-dev shashank03-dev commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

🛑 STOP: Assignment & File Scope Check

  • I am assigned to this issue.
  • I verified that this PR ONLY touches the required files.

📋 PR Summary & Link

The price back-fill (merge_jan_aushadhi_price in apps/etl/src/loaders/supabase_loader.py) was calling .upsert(). PostgREST turns an upsert into INSERT ... ON CONFLICT, which means Postgres validates the full row before it ever gets to the conflict clause. Our batch payload only carries id and jan_aushadhi_price, so every row tripped the NOT NULL constraint on generic_name (and brand_name, manufacturer). The batch threw, and the loader dropped down to one PATCH per row, which is roughly 100x slower.

I added a Postgres function, bulk_update_jan_aushadhi_price(p_updates jsonb), that runs a single UPDATE of jan_aushadhi_price matched on id. No INSERT happens, so the not-null columns are never looked at and nothing else on the row can change. The loader calls it once per batch through the retry helper that was already there, and keeps the row-by-row update as a fallback if the RPC ever fails.

Two things worth flagging for review:

  • I used jan_aushadhi_price as the JSON key instead of price like the issue example. It matches the payload the loader already builds and makes it obvious which column gets written.
  • The function returns how many rows it actually updated. If that number comes back lower than the batch (say a row got deleted between the page scan and the update), the loader now adds the difference to failed instead of pretending everything went through. That keeps checked == updated + skipped + failed honest.

📸 Proof of Work (Screenshots / Logs)

No UI here, it's an ETL/DB change. Two kinds of proof.

Tests:

$ python3 -m pytest apps/etl/tests/test_loader.py -q
18 passed
$ python3 -m pytest -q        # full apps/etl suite
25 passed
$ ruff check apps/etl/src/loaders/supabase_loader.py
All checks passed!

New tests cover the back-fill going through the RPC with an {id, jan_aushadhi_price}-only payload (never an upsert), the retry-then-fallback path, a short row count getting counted as failed, and an unrecognized RPC response being treated as a full batch with a loud log.

I also ran it against a real Postgres 16 to be sure the migration and function behave, not just the test fake. Created the actual medicines table with its not-null columns, applied this PR's migration, then put one row through the old path and the new one:

======== BEFORE ========
 id                                   | generic_name | jan_aushadhi_price
 11111111-...-111111111111            | Paracetamol  |  (null)

======== OLD PATH: what PostgREST .upsert([{id, jan_aushadhi_price}]) compiles to ========
ERROR:  null value in column "brand_name" of relation "medicines" violates not-null constraint
DETAIL:  Failing row contains (1111..., null, null, null, null, null, 10.50).

======== NEW PATH: bulk_update_jan_aushadhi_price RPC (returns rows updated) ========
 rows_updated
            1

======== AFTER (price filled, generic_name intact) ========
 id                                   | generic_name | jan_aushadhi_price
 11111111-...-111111111111            | Paracetamol  |  10.50

======== EDGE: missing id -> 0 updated (loader counts the shortfall as failed) ========
 rows_updated
            0

======== EDGE: non-array payload -> guarded exception ========
ERROR:  p_updates must be a JSON array of {id, jan_aushadhi_price} objects, got null

The old .upsert() reproduces the exact not-null error from the issue, and it does so even though the row already exists. The RPC fills the price, hands back a count, and leaves generic_name alone.

One operational note: the function needs this PR's migration (20260616174742_add_bulk_update_jan_aushadhi_price_rpc.sql) applied to the environment. If it isn't there, the loader logs an error and falls back to the old row-by-row path rather than failing the run outright.

🏷️ PR Type

  • 🐛 type: bug
  • type: performance

✅ Checklist

  • My PR has a linked issue (Closes #1966)
  • I have pulled the latest main and resolved any conflicts

merge_jan_aushadhi_price used PostgREST .upsert(), which compiles to
INSERT ... ON CONFLICT. Even for existing rows the full INSERT was
validated against medicines.generic_name (NOT NULL), but the batch
payload only carries {id, jan_aushadhi_price}, so every batch failed and
fell back to ~100x slower row-by-row PATCHes.

Add a bulk_update_jan_aushadhi_price(jsonb) RPC that does a single
atomic UPDATE of jan_aushadhi_price keyed by id — no INSERT, so NOT NULL
columns are never touched and no other field can be corrupted. The
loader now calls it per batch, reconciles a short row count as failed,
and keeps the row-by-row path as a fallback.

Closes RatLoopz#1966

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2c6d3cf4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread supabase/migrations/20260616174742_add_bulk_update_jan_aushadhi_price_rpc.sql Outdated
Codex review (P1): Postgres grants EXECUTE on new functions to PUBLIC,
so the SECURITY DEFINER writer was callable by anon/authenticated via
PostgREST, letting any caller overwrite medicines.jan_aushadhi_price and
bypass the medicines_service_write RLS policy.

Switch the function to SECURITY INVOKER so RLS still applies, then REVOKE
EXECUTE from PUBLIC and GRANT only to service_role. The ETL connects as
service_role (allowed by RLS); anon/authenticated now get permission
denied.
@dipexplorer dipexplorer added quality:exceptional multiplier x1.5 type:security Auth, rate limiting, security type:performance Performance optimization or latency improvements shoutout:approved Approved by admin for LinkedIn Shoutout labels Jun 17, 2026
@dipexplorer dipexplorer merged commit 32eb8df into RatLoopz:main Jun 17, 2026
26 of 29 checks passed
@github-project-automation github-project-automation Bot moved this from 📥 Backlog to 🎉 Merged in SahiDawa Workflow Jun 17, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🎉 Congratulations @shashank03-dev! Your Pull Request "fix(etl): bulk-update Jan Aushadhi prices via atomic RPC" has been successfully merged by @dipexplorer.

Thank you for your valuable contribution to SahiDawa! 🇮🇳
If this was for GSSoC 2026, your work is officially merged and valid. Keep up the great work and feel free to claim other open issues. 🚀

Follow us on LinkedIn: https://www.linkedin.com/company/ratloopz/ to get shoutout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved Approved for gssoc level:critical 80 pts quality:exceptional multiplier x1.5 shoutout:approved Approved by admin for LinkedIn Shoutout status: open-for-all type:bug Something isn't working type:performance Performance optimization or latency improvements type:security Auth, rate limiting, security

Projects

Status: 🎉 Merged

Development

Successfully merging this pull request may close these issues.

[BUG] merge_jan_aushadhi_price Bulk Upsert fails due to missing NOT NULL columns

2 participants