feat(gateway): multi-replica support (session affinity + cron coordination)#132
Open
jacoblee-io wants to merge 1 commit into
Open
feat(gateway): multi-replica support (session affinity + cron coordination)#132jacoblee-io wants to merge 1 commit into
jacoblee-io wants to merge 1 commit into
Conversation
…n coordination Enable Gateway to run multiple replicas without Redis by combining three mechanisms: 1. Helm: session affinity (ClientIP) when replicas > 1, pod anti-affinity, and downward API env vars (SICLAW_POD_NAME, SICLAW_POD_IP) 2. Config cache TTL: 30s setInterval polls DB for CSP, metrics, and SSO config changes so non-local replicas converge within 30s 3. Cron coordination: restore distributed scheduling logic from the old CronCoordinator (commit 7c41fa6) into CronService — instance registration, heartbeat, dead-instance detection, atomic job claiming (least-loaded first), cancelStaleJobs, syncAssignedJobs. Single-instance mode (instanceId=undefined) preserves existing behavior exactly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enable Gateway to run multiple replicas in K8s without introducing Redis, using three complementary mechanisms:
replicas > 1, pod anti-affinity for spread, downward API env vars (SICLAW_POD_NAME,SICLAW_POD_IP)CronCoordinator(commit7c41fa6) — instance registration, heartbeat (15s), dead-instance detection (30s threshold), atomic least-loaded job claiming, stale job cancellation, and assigned job sync. Single-instance mode (instanceId=undefined) preserves existing behavior exactly.Files changed
values.yaml,gateway-service.yaml,gateway-deployment.yamlserver.ts(30s TTL interval + cleanup inclose())cron-service.ts(reconcile loop, claim/cancel/sync)gateway-main.ts(instanceId resolution),server.ts(pass-through)Known limitations (pre-existing, out of scope)
These issues exist independently of this PR and are tracked for follow-up:
pendingStatesin-memoryTest plan
npx tsc --noEmitpassesnpm test— all tests passreplicas: 1, verify cron jobs fire normally, config changes apply immediately, notifications deliveredreplicas: 2, verifycron_instancestable shows both instances,cron_jobs.assigned_todistributes jobs, kill one pod and verify failover within 30s