Skip to content

[Bug] InstallationHandler non-atomic check-then-insert causes unique-key violation under concurrent delivery #110

@thomascolden585-svg

Description

@thomascolden585-svg

Description

InstallationHandler is the only webhook handler in the codebase that does not use an upsert — it uses a non-atomic check-then-insert pattern (findOneBy → insert/update). Under concurrent webhook delivery, a race between the findOneBy and the subsequent insert causes a unique-key violation (repo_full_name primary key) that crashes the handler and poisons the delivery record, preventing any retry from succeeding.

Severity: HIGH

Why this is non-duplicate

The existing delivery-dedup system (claimDelivery in webhook.service.ts:40-55) only guards against two workers racing on the same deliveryId. It does not protect against two logically distinct deliveries that target the same repo:

  • An installation.created event and an installation_repositories.added event arriving in the same second for the same repo (different deliveryId, different payload type — both pass claimDelivery).
  • GitHub re-delivering an event with a new deliveryId (e.g. after a server 5xx) before the first delivery's handler commits.
  • The installation.created loop itself: for a 30-repo install, it fires 30 sequential check-then-insert pairs with no transaction, giving a wide window for a concurrent second delivery to interleave.

Crucially, webhook.service.ts:38 documents the assumption: "all handlers are upserts, so that's safe" — this comment is false for InstallationHandler, and the bug directly breaks that safety property.

Relationship to existing issues

Issue Overlap Verdict
#70 — normalize repoFullName to lowercase Case normalization is a prerequisite for any constraint-based dedup, but it does not fix the race; it only narrows the window. Complementary — not a duplicate.
#108 — Redis response cache Unrelated (caching layer). No overlap.
#109 — per-repo since windows Unrelated (query scoping). No overlap.

This bug stands alone as the missing upsert in the only handler that lacks one.


Steps to Reproduce

  1. Configure a GitHub App and install it on an organisation with ≥ 2 repositories.
  2. Ensure the webhook endpoint has two or more active Node processes (or trigger a duplicate delivery via GitHub's "Redeliver" button using a different delivery ID before the first is committed).
  3. GitHub fires installation.created. Both processes pass claimDelivery (different deliveryId).
  4. Both enter InstallationHandler.handle. Both call findOneBy({ repoFullName }) on the same repo and both get null.
  5. Both call repoRepo.insert(...). The second insert throws:
QueryFailedError: duplicate key value violates unique constraint "PK_repos"
  1. The handler throws, the delivery is never marked processed_at, and any subsequent retry with the same deliveryId reprocesses — and fails again.

Expected Behavior

InstallationHandler behaves like every other handler in the codebase: an INSERT … ON CONFLICT DO UPDATE ensures the row is created-or-updated atomically regardless of concurrent deliveries, and the operation is fully idempotent (safe to reprocess).

added_at is preserved on first insert and not overwritten on subsequent upserts.


Actual Behavior

A unique-key violation (duplicate key value violates unique constraint "PK_repos") is thrown from repoRepo.insert when two concurrent deliveries reach the insert branch simultaneously. The exception propagates unhandled, crashing the delivery handler and leaving processed_at = NULL in webhook_deliveries, making all future retries for that delivery repeat the failure.


Environment

  • Affects: all environments where the webhook endpoint can receive >1 concurrent delivery for the same GitHub App installation
  • Especially likely in: installation.created events for organisations with many repos, or any environment with multiple Node processes behind a load balancer

Proposed Fix

Replace the check-then-insert block in installation.handler.ts:43-56 with a single upsert, preserving addedAt semantics via a conditional SET:

await this.repoRepo.upsert(
  {
    repoFullName: repo.full_name,
    installationId: String(installationId),
    addedAt: new Date().toISOString(),   // ignored on conflict — see conflictPaths
  },
  {
    conflictPaths: ["repoFullName"],
    // Only update installationId; addedAt is intentionally excluded so the
    // original first-seen timestamp is preserved across re-fires.
    skipUpdateIfNoValuesChanged: false,
    upsertType: "on-conflict-do-update",
  },
);

Because TypeORM's upsert does not allow per-column exclusion from the SET clause directly, the cleanest equivalent is a raw query:

INSERT INTO repos (repo_full_name, installation_id, added_at)
VALUES ($1, $2, NOW())
ON CONFLICT (repo_full_name)
DO UPDATE SET installation_id = EXCLUDED.installation_id;
-- added_at intentionally omitted from SET so it is never overwritten

Also update the comment in webhook.service.ts:38 to remain accurate once the fix lands.


Additional Context

  • All other handlers (pull-request, issue, review, comment, review-comment) already use repoRepo.upsert(…, conflictColumns) correctly.
  • The repos table primary key is repo_full_name (01_repos.sql).
  • The delivery-dedup mechanism in webhook.service.ts:40-55 is correct and not implicated — this bug lives strictly inside the handler, below the claim layer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions