Skip to content

fix(reporter,fetcher): align RabbitMQ topology with app code and make bootstrap idempotent#1372

Open
brunobls wants to merge 1 commit into
developfrom
fix/reporter-fetcher-rabbitmq-topology-idempotent-bootstrap
Open

fix(reporter,fetcher): align RabbitMQ topology with app code and make bootstrap idempotent#1372
brunobls wants to merge 1 commit into
developfrom
fix/reporter-fetcher-rabbitmq-topology-idempotent-bootstrap

Conversation

@brunobls
Copy link
Copy Markdown
Member

@brunobls brunobls commented May 14, 2026

Summary

Reconciles RabbitMQ topology in the reporter and fetcher charts with what the application code actually uses, removes accumulated dead code, aligns DLX/DLQ naming with the fetcher repo, and refactors the bootstrap Job to apply topology idempotently (fixing upgrades against already-initialized RabbitMQ instances).

Changes — charts/fetcher

  • Remove dead code: fetcher.generate-fetcher.* (queue, dlq, exchange, dlx, 2 bindings) — zero references in fetcher source.
  • Remove dead exchange + binding: fetcher.extract-external-data.exchange. Manager publishes via the default exchange ("") with routing_key = queue_name (create_fetcher_job.go:681), so no application exchange is needed.
  • Rename DLX/DLQ: fetcher.extract-external-data.dlx/dlqfetcher.dlx/dlq, aligning with the fetcher repo's local definitions.json.
  • Add fetcher.job.events topic exchange — worker publishes job event notifications here (job_notification.go:155).

Changes — charts/reporter

  • Add fetcher integration topology: queue reporter.fetcher.job.events (with DLX → reporter.dlx) plus 2 bindings on fetcher.job.events (routing keys job.completed.reporter, job.failed.reporter).
  • Defensively declare fetcher.job.events with args identical to the fetcher chart. PUT with matching args is idempotent — whichever chart bootstraps first creates the exchange, the other is a no-op. This removes any cross-chart install order dependency.

Bootstrap refactor (both charts)

Replaces the brittle gate (skip if user 'plugin' exists) with idempotent per-resource apply:

  • apply() — PUT for vhost/user/permissions/exchanges/queues. Identical args → 2xx no-op; divergent args → precondition_failed (fail-loud on drift).
  • ensure_binding() — GET before POST to avoid duplicate bindings on re-run.

Shared topology contract

fetcher.job.events is declared by both charts. Comments in both files flag this as a shared contract: any arg change must land in both charts in the same PR.

Risk

Immediate runtime risk: zero. externalRabbitmqDefinitions.enabled is not enabled for reporter/fetcher in any environment across clotilde, firmino or benedita gitops. The bootstrap Job doesn't render today; current production topology is managed manually outside the chart.

Latent risks (when someone opts in):

  1. Existing queues with divergent args (e.g. clotilde-sandbox reporter.generate-report.queue lacks DLX) will trip precondition_failed. Intended fail-loud behavior; requires manual queue recreate.
  2. Legacy fetcher.extract-external-data.dlx/dlq become orphans (no traffic, no harm — cleanup is manual).
  3. The plugin user is created even in environments using a different app user. Pre-existing, not a regression.

… bootstrap idempotent

Both reporter and fetcher charts had drift between the declared RabbitMQ
topology and what the application code actually uses, and the bootstrap
Job had a fragile idempotency gate (skip if user 'plugin' exists) that
prevented applying topology deltas to RabbitMQ instances already
initialized. This caused the reporter-worker to crash-loop when enabled
against an existing RabbitMQ that lacked the fetcher integration queue.

Changes

charts/fetcher
  - Remove dead code: fetcher.generate-fetcher.{queue,dlq,exchange,dlx}
    plus 2 bindings (zero references in fetcher source).
  - Remove dead exchange fetcher.extract-external-data.exchange and its
    binding; the manager publishes via the default exchange ("") with
    routing_key = queue name, no application exchange is needed.
  - Rename DLX/DLQ to fetcher.dlx and fetcher.dlq (shared) aligning with
    components/infra/rabbitmq/etc/definitions.json in the fetcher repo.
  - Add fetcher.job.events topic exchange (worker publishes job event
    notifications here at runtime).
  - Rewrite bootstrap-rabbitmq.yaml to apply topology idempotently via
    PUT per resource and POST per binding (with prior GET check), replacing
    the fragile 'user plugin already exists' skip gate that prevented
    upgrades on existing brokers.

charts/reporter
  - Add queue reporter.fetcher.job.events plus 2 bindings on
    fetcher.job.events (routing keys job.completed.reporter and
    job.failed.reporter) so the reporter-worker can consume job event
    notifications produced by the fetcher worker.
  - Declare fetcher.job.events defensively (same args as the fetcher
    chart) so the reporter chart can be installed and become fully
    operational without depending on fetcher chart install order. PUT
    with identical args is idempotent in RabbitMQ; whichever chart
    bootstraps first creates the exchange, the other is a no-op.
    fetcher.job.events is a shared topology contract — args MUST match
    between the two charts.
  - Same bootstrap-rabbitmq.yaml rewrite for idempotent apply.

Drift / migration notes

  - Environments already running with the old DLX/DLQ names
    (fetcher.extract-external-data.dlx/dlq) will see those resources
    become orphans after the new chart applies. They are not deleted by
    the bootstrap; cleanup is manual or in a separate follow-up.
  - Environments with reporter.generate-report.queue missing the
    x-dead-letter-exchange argument will see precondition_failed on the
    queue PUT. Bootstrap is gated by externalRabbitmqDefinitions.enabled
    (default false) so existing environments are unaffected until they
    opt in. Documented for ops awareness.
  - The fetcher repo (LerianStudio/fetcher) still has a dead exchange +
    binding in components/infra/rabbitmq/etc/definitions.json. Follow-up
    PR will clean that up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@brunobls brunobls requested a review from a team as a code owner May 14, 2026 18:29
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Walkthrough

The PR refactors RabbitMQ configuration and bootstrap for fetcher and reporter services. It consolidates the fetcher's queue dead-letter routing to a shared exchange, simplifies topology declarations, and replaces file-based definitions loading with explicit, idempotent Management HTTP API calls in both bootstrap scripts.

Changes

RabbitMQ Topology and Bootstrap Refactor

Layer / File(s) Summary
Fetcher topology consolidation
charts/fetcher/files/rabbitmq/load_definitions.json
Fetcher queue dead-letter routing is centralized to a shared fetcher.dlx exchange, and the fetcher.generate-fetcher.queue is removed. Exchanges are simplified to fetcher.dlx (direct) and fetcher.job.events (topic), with bindings reduced to a single routing from fetcher.dlx to fetcher.dlq.
Fetcher bootstrap explicit API approach
charts/fetcher/templates/bootstrap-rabbitmq.yaml
Fetcher bootstrap Job is rewritten to use Management HTTP API calls with a standardized apply() helper for vhost/user/permissions setup, then explicit calls to create exchanges and queues, and a new ensure_binding() helper to idempotently wire bindings without duplicates.
Reporter queue and bindings for fetcher events
charts/reporter/files/rabbitmq/load_definitions.json
Reporter adds a new queue reporter.fetcher.job.events with dead-letter configuration, declares the fetcher.job.events exchange (already published by fetcher), and wires two bindings to route job completion and failure events into reporter's processing queue.
Reporter bootstrap explicit API approach
charts/reporter/templates/common/bootstrap-rabbitmq.yaml
Reporter bootstrap Job is rewritten with the same explicit Management API pattern: standardized apply() helper for vhost/user setup, explicit PUTs for exchanges and queues (including the new fetcher.job.events cross-chart contract), idempotent ensure_binding() helper to create reporter and fetcher-to-reporter bindings, and adjusted completion messaging.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
charts/reporter/templates/common/bootstrap-rabbitmq.yaml (1)

15-16: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use a hook or revisioned name for this bootstrap Job.

This rewrite changes spec.template, but reporter-bootstrap-rabbitmq stays a normal fixed-name Job. On upgrade from a release that already created this Job, Helm will patch the existing resource and Kubernetes will reject the pod-template update as immutable. Upgrades can still fail until the old Job is gone, which weakens the idempotent-bootstrap goal.

Possible direction
 metadata:
   name: reporter-bootstrap-rabbitmq
+  annotations:
+    "helm.sh/hook": post-install,post-upgrade
+    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded

Also applies to: 17-24, 50-149

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@charts/reporter/templates/common/bootstrap-rabbitmq.yaml` around lines 15 -
16, The Job "reporter-bootstrap-rabbitmq" updates its pod template
(spec.template) but uses a fixed name which causes Kubernetes to reject
immutable template changes on upgrade; change the manifest so the Job is created
as a hook or given a revisioned name to avoid patching an existing Job — either
add Helm hook annotations (e.g., pre-install/pre-upgrade) to run it as a
transient hook Job, or append a release-specific/revision token (use
Release.Name or Release.Revision) to "reporter-bootstrap-rabbitmq" so each
deployment creates a new Job instance instead of attempting to patch
spec.template.
charts/fetcher/templates/bootstrap-rabbitmq.yaml (1)

15-16: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use a hook or revisioned name for this bootstrap Job.

fetcher-bootstrap-rabbitmq is still a fixed-name Job, but this PR rewrites its pod template. During upgrade, Helm will try to patch the existing Job and Kubernetes rejects spec.template changes for Jobs. That means upgrades can still fail whenever the prior Job has not been deleted yet.

Possible direction
 metadata:
   name: fetcher-bootstrap-rabbitmq
+  annotations:
+    "helm.sh/hook": post-install,post-upgrade
+    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded

Also applies to: 18-25, 51-143

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@charts/fetcher/templates/bootstrap-rabbitmq.yaml` around lines 15 - 16, The
Job uses a fixed metadata.name "fetcher-bootstrap-rabbitmq" so Helm will try to
patch spec.template on upgrade (which K8s rejects); make the Job either a Helm
hook or give it a revisioned/unique name so upgrades create a new Job instead of
patching the existing one. Concretely, update the Job manifest (metadata.name
"fetcher-bootstrap-rabbitmq" and the other Job blocks at the mentioned ranges)
to either add Helm hook annotations (e.g., "helm.sh/hook":
pre-upgrade/pre-install and appropriate hook-delete-policy) or append a
release-specific suffix (for example using Release.Revision or Release.UniqueID
in the metadata.name via Helm templating) so the Job becomes immutable across
upgrades and avoids spec.template patch errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@charts/fetcher/templates/bootstrap-rabbitmq.yaml`:
- Around line 15-16: The Job uses a fixed metadata.name
"fetcher-bootstrap-rabbitmq" so Helm will try to patch spec.template on upgrade
(which K8s rejects); make the Job either a Helm hook or give it a
revisioned/unique name so upgrades create a new Job instead of patching the
existing one. Concretely, update the Job manifest (metadata.name
"fetcher-bootstrap-rabbitmq" and the other Job blocks at the mentioned ranges)
to either add Helm hook annotations (e.g., "helm.sh/hook":
pre-upgrade/pre-install and appropriate hook-delete-policy) or append a
release-specific suffix (for example using Release.Revision or Release.UniqueID
in the metadata.name via Helm templating) so the Job becomes immutable across
upgrades and avoids spec.template patch errors.

In `@charts/reporter/templates/common/bootstrap-rabbitmq.yaml`:
- Around line 15-16: The Job "reporter-bootstrap-rabbitmq" updates its pod
template (spec.template) but uses a fixed name which causes Kubernetes to reject
immutable template changes on upgrade; change the manifest so the Job is created
as a hook or given a revisioned name to avoid patching an existing Job — either
add Helm hook annotations (e.g., pre-install/pre-upgrade) to run it as a
transient hook Job, or append a release-specific/revision token (use
Release.Name or Release.Revision) to "reporter-bootstrap-rabbitmq" so each
deployment creates a new Job instance instead of attempting to patch
spec.template.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8cbb38bc-7534-422e-b2d2-e574a6eef4bb

📥 Commits

Reviewing files that changed from the base of the PR and between f2a9b79 and 04fc1a0.

📒 Files selected for processing (4)
  • charts/fetcher/files/rabbitmq/load_definitions.json
  • charts/fetcher/templates/bootstrap-rabbitmq.yaml
  • charts/reporter/files/rabbitmq/load_definitions.json
  • charts/reporter/templates/common/bootstrap-rabbitmq.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant