fix(reporter,fetcher): align RabbitMQ topology with app code and make bootstrap idempotent#1372
Conversation
… bootstrap idempotent
Both reporter and fetcher charts had drift between the declared RabbitMQ
topology and what the application code actually uses, and the bootstrap
Job had a fragile idempotency gate (skip if user 'plugin' exists) that
prevented applying topology deltas to RabbitMQ instances already
initialized. This caused the reporter-worker to crash-loop when enabled
against an existing RabbitMQ that lacked the fetcher integration queue.
Changes
charts/fetcher
- Remove dead code: fetcher.generate-fetcher.{queue,dlq,exchange,dlx}
plus 2 bindings (zero references in fetcher source).
- Remove dead exchange fetcher.extract-external-data.exchange and its
binding; the manager publishes via the default exchange ("") with
routing_key = queue name, no application exchange is needed.
- Rename DLX/DLQ to fetcher.dlx and fetcher.dlq (shared) aligning with
components/infra/rabbitmq/etc/definitions.json in the fetcher repo.
- Add fetcher.job.events topic exchange (worker publishes job event
notifications here at runtime).
- Rewrite bootstrap-rabbitmq.yaml to apply topology idempotently via
PUT per resource and POST per binding (with prior GET check), replacing
the fragile 'user plugin already exists' skip gate that prevented
upgrades on existing brokers.
charts/reporter
- Add queue reporter.fetcher.job.events plus 2 bindings on
fetcher.job.events (routing keys job.completed.reporter and
job.failed.reporter) so the reporter-worker can consume job event
notifications produced by the fetcher worker.
- Declare fetcher.job.events defensively (same args as the fetcher
chart) so the reporter chart can be installed and become fully
operational without depending on fetcher chart install order. PUT
with identical args is idempotent in RabbitMQ; whichever chart
bootstraps first creates the exchange, the other is a no-op.
fetcher.job.events is a shared topology contract — args MUST match
between the two charts.
- Same bootstrap-rabbitmq.yaml rewrite for idempotent apply.
Drift / migration notes
- Environments already running with the old DLX/DLQ names
(fetcher.extract-external-data.dlx/dlq) will see those resources
become orphans after the new chart applies. They are not deleted by
the bootstrap; cleanup is manual or in a separate follow-up.
- Environments with reporter.generate-report.queue missing the
x-dead-letter-exchange argument will see precondition_failed on the
queue PUT. Bootstrap is gated by externalRabbitmqDefinitions.enabled
(default false) so existing environments are unaffected until they
opt in. Documented for ops awareness.
- The fetcher repo (LerianStudio/fetcher) still has a dead exchange +
binding in components/infra/rabbitmq/etc/definitions.json. Follow-up
PR will clean that up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThe PR refactors RabbitMQ configuration and bootstrap for fetcher and reporter services. It consolidates the fetcher's queue dead-letter routing to a shared exchange, simplifies topology declarations, and replaces file-based definitions loading with explicit, idempotent Management HTTP API calls in both bootstrap scripts. ChangesRabbitMQ Topology and Bootstrap Refactor
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
charts/reporter/templates/common/bootstrap-rabbitmq.yaml (1)
15-16:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftUse a hook or revisioned name for this bootstrap Job.
This rewrite changes
spec.template, butreporter-bootstrap-rabbitmqstays a normal fixed-name Job. On upgrade from a release that already created this Job, Helm will patch the existing resource and Kubernetes will reject the pod-template update as immutable. Upgrades can still fail until the old Job is gone, which weakens the idempotent-bootstrap goal.Possible direction
metadata: name: reporter-bootstrap-rabbitmq + annotations: + "helm.sh/hook": post-install,post-upgrade + "helm.sh/hook-delete-policy": before-hook-creation,hook-succeededAlso applies to: 17-24, 50-149
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@charts/reporter/templates/common/bootstrap-rabbitmq.yaml` around lines 15 - 16, The Job "reporter-bootstrap-rabbitmq" updates its pod template (spec.template) but uses a fixed name which causes Kubernetes to reject immutable template changes on upgrade; change the manifest so the Job is created as a hook or given a revisioned name to avoid patching an existing Job — either add Helm hook annotations (e.g., pre-install/pre-upgrade) to run it as a transient hook Job, or append a release-specific/revision token (use Release.Name or Release.Revision) to "reporter-bootstrap-rabbitmq" so each deployment creates a new Job instance instead of attempting to patch spec.template.charts/fetcher/templates/bootstrap-rabbitmq.yaml (1)
15-16:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftUse a hook or revisioned name for this bootstrap Job.
fetcher-bootstrap-rabbitmqis still a fixed-name Job, but this PR rewrites its pod template. During upgrade, Helm will try to patch the existing Job and Kubernetes rejectsspec.templatechanges for Jobs. That means upgrades can still fail whenever the prior Job has not been deleted yet.Possible direction
metadata: name: fetcher-bootstrap-rabbitmq + annotations: + "helm.sh/hook": post-install,post-upgrade + "helm.sh/hook-delete-policy": before-hook-creation,hook-succeededAlso applies to: 18-25, 51-143
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@charts/fetcher/templates/bootstrap-rabbitmq.yaml` around lines 15 - 16, The Job uses a fixed metadata.name "fetcher-bootstrap-rabbitmq" so Helm will try to patch spec.template on upgrade (which K8s rejects); make the Job either a Helm hook or give it a revisioned/unique name so upgrades create a new Job instead of patching the existing one. Concretely, update the Job manifest (metadata.name "fetcher-bootstrap-rabbitmq" and the other Job blocks at the mentioned ranges) to either add Helm hook annotations (e.g., "helm.sh/hook": pre-upgrade/pre-install and appropriate hook-delete-policy) or append a release-specific suffix (for example using Release.Revision or Release.UniqueID in the metadata.name via Helm templating) so the Job becomes immutable across upgrades and avoids spec.template patch errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@charts/fetcher/templates/bootstrap-rabbitmq.yaml`:
- Around line 15-16: The Job uses a fixed metadata.name
"fetcher-bootstrap-rabbitmq" so Helm will try to patch spec.template on upgrade
(which K8s rejects); make the Job either a Helm hook or give it a
revisioned/unique name so upgrades create a new Job instead of patching the
existing one. Concretely, update the Job manifest (metadata.name
"fetcher-bootstrap-rabbitmq" and the other Job blocks at the mentioned ranges)
to either add Helm hook annotations (e.g., "helm.sh/hook":
pre-upgrade/pre-install and appropriate hook-delete-policy) or append a
release-specific suffix (for example using Release.Revision or Release.UniqueID
in the metadata.name via Helm templating) so the Job becomes immutable across
upgrades and avoids spec.template patch errors.
In `@charts/reporter/templates/common/bootstrap-rabbitmq.yaml`:
- Around line 15-16: The Job "reporter-bootstrap-rabbitmq" updates its pod
template (spec.template) but uses a fixed name which causes Kubernetes to reject
immutable template changes on upgrade; change the manifest so the Job is created
as a hook or given a revisioned name to avoid patching an existing Job — either
add Helm hook annotations (e.g., pre-install/pre-upgrade) to run it as a
transient hook Job, or append a release-specific/revision token (use
Release.Name or Release.Revision) to "reporter-bootstrap-rabbitmq" so each
deployment creates a new Job instance instead of attempting to patch
spec.template.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8cbb38bc-7534-422e-b2d2-e574a6eef4bb
📒 Files selected for processing (4)
charts/fetcher/files/rabbitmq/load_definitions.jsoncharts/fetcher/templates/bootstrap-rabbitmq.yamlcharts/reporter/files/rabbitmq/load_definitions.jsoncharts/reporter/templates/common/bootstrap-rabbitmq.yaml
Summary
Reconciles RabbitMQ topology in the reporter and fetcher charts with what the application code actually uses, removes accumulated dead code, aligns DLX/DLQ naming with the fetcher repo, and refactors the bootstrap Job to apply topology idempotently (fixing upgrades against already-initialized RabbitMQ instances).
Changes —
charts/fetcherfetcher.generate-fetcher.*(queue, dlq, exchange, dlx, 2 bindings) — zero references in fetcher source.fetcher.extract-external-data.exchange. Manager publishes via the default exchange ("") withrouting_key = queue_name(create_fetcher_job.go:681), so no application exchange is needed.fetcher.extract-external-data.dlx/dlq→fetcher.dlx/dlq, aligning with the fetcher repo's localdefinitions.json.fetcher.job.eventstopic exchange — worker publishes job event notifications here (job_notification.go:155).Changes —
charts/reporterreporter.fetcher.job.events(with DLX →reporter.dlx) plus 2 bindings onfetcher.job.events(routing keysjob.completed.reporter,job.failed.reporter).fetcher.job.eventswith args identical to the fetcher chart. PUT with matching args is idempotent — whichever chart bootstraps first creates the exchange, the other is a no-op. This removes any cross-chart install order dependency.Bootstrap refactor (both charts)
Replaces the brittle gate (
skip if user 'plugin' exists) with idempotent per-resource apply:apply()— PUT for vhost/user/permissions/exchanges/queues. Identical args → 2xx no-op; divergent args →precondition_failed(fail-loud on drift).ensure_binding()— GET before POST to avoid duplicate bindings on re-run.Shared topology contract
fetcher.job.eventsis declared by both charts. Comments in both files flag this as a shared contract: any arg change must land in both charts in the same PR.Risk
Immediate runtime risk: zero.
externalRabbitmqDefinitions.enabledis not enabled for reporter/fetcher in any environment acrossclotilde,firminoorbeneditagitops. The bootstrap Job doesn't render today; current production topology is managed manually outside the chart.Latent risks (when someone opts in):
reporter.generate-report.queuelacks DLX) will tripprecondition_failed. Intended fail-loud behavior; requires manual queue recreate.fetcher.extract-external-data.dlx/dlqbecome orphans (no traffic, no harm — cleanup is manual).pluginuser is created even in environments using a different app user. Pre-existing, not a regression.