Skip to content

feat: Helm chart + image for self-hosted deployment#22

Merged
tarekziade merged 10 commits into
mainfrom
feat/hub-prod-deploy
Jun 18, 2026
Merged

feat: Helm chart + image for self-hosted deployment#22
tarekziade merged 10 commits into
mainfrom
feat/hub-prod-deploy

Conversation

@rtrompier

@rtrompier rtrompier commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Packages serge's web app (reviewbot-web) so a team can deploy it themselves with helm install against their own cluster. No app-code change — SQLite remains the job store, now persisted on a PersistentVolumeClaim.

Contents

  • chart/ — generic Helm chart:
    • Deployment (single replica, Recreate strategy — SQLite is single-writer), Service, optional Ingress, ConfigMap, optional ServiceAccount.
    • PVC for the embedded SQLite store mounted at persistence.mountPath (default /var/lib/reviewbot); WEB_STORE_PATH points at <mountPath>/jobs.db so review/task history survives restarts.
    • Hardened securityContext (non-root, fsGroup so the app user can write the volume, dropped capabilities, seccomp RuntimeDefault).
    • Nothing cluster-specific hard-coded: ingress (host/class/annotations/TLS), nodeSelector/tolerations/affinity, storageClass, image registry and resources are all values-driven. Sensitive env is injected from a pre-created Secret referenced by existingSecret.
  • Dockerfile — production image for reviewbot-web (uvicorn on $PORT, default 8080; bubblewrap for HELPER_SANDBOX).
  • CI — builds and pushes the image to the internal registry.

Deploy

  1. Create a Kubernetes Secret with the sensitive env (GITHUB_APP_ID, GITHUB_PRIVATE_KEY, GITHUB_WEBHOOK_SECRET, GITHUB_OAUTH_CLIENT_ID, GITHUB_OAUTH_CLIENT_SECRET, WEB_SESSION_SECRET, LLM_API_KEY, …) in the target namespace.
  2. helm install serge ./chart -n <ns> --set existingSecret=<secret-name> --set ingress.enabled=true --set ingress.host=<host> --set ingress.className=<class>

Notes

  • helm lint + helm template pass; image builds and boots (serves /healthz).
  • HELPER_SANDBOX (bubblewrap) needs the cluster to allow unprivileged user namespaces; set it to auto/off via envVars if not available.
  • The earlier hub-prod / ArgoCD / Postgres approach was dropped per the new plan (self-hosted Helm deploy). History on this branch shows that evolution.

@rtrompier rtrompier changed the title feat: deploy to hub-prod (Postgres job store, chart, image) feat: Helm chart + image for self-hosted deployment Jun 15, 2026
@tarekziade

Copy link
Copy Markdown
Collaborator

@askserge review this helm chart

@sergereview sergereview Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤗 Serge says:

The Helm chart is well-structured and follows good practices for a single-replica SQLite-backed deployment. However, there are a few correctness issues that should be fixed before merging.

Correctness

  • chart/templates/_helpers.tpl: The name helper calls default with only one argument ($.Release.Name), which makes the default function a no-op. It should provide an actual fallback value (e.g., .Chart.Name) so the helper behaves as intended.
  • chart/templates/deployment.yaml: The serviceAccountName is only injected when .Values.serviceAccount.create is true. If a user sets create: false but provides an existing SA name (e.g., serviceAccount.name: "my-sa"), the Deployment silently ignores it and falls back to the namespace's default SA. The condition should also check whether serviceAccount.name is non-empty.
  • Dockerfile / chart/values.yaml UID mismatch: The Dockerfile creates the app user without an explicit UID/GID, but the Helm podSecurityContext hard-codes runAsUser: 1000, runAsGroup: 1000, and fsGroup: 1000. If the image's app user happens to get a different UID (e.g., 1001), the container will run as UID 1000 but the filesystem ownership will not match, potentially breaking volume writes. The Dockerfile should pin the user to --uid 1000 --gid 1000.

Reliability / Operations

  • chart/values.yaml: Defaulting image.tag to latest with pullPolicy: IfNotPresent is a Kubernetes anti-pattern. Nodes that already have a latest image cached will never pull updates, leading to version skew across the cluster. Consider defaulting to a stable tag (e.g., 0.1.0) or switching pullPolicy to Always when latest is used.

Security

  • No hardcoded secrets; sensitive configuration is correctly off-loaded to existingSecret. The securityContext hardening (drop ALL capabilities, RuntimeDefault seccomp, non-root) is solid.

Minor

  • The .dockerignore file was not updated in this PR. The production Dockerfile will currently copy the entire repo (including chart/, aws/, docs/, tests/, .github/) into the image, bloating the layer unnecessarily. Consider adding those directories to .dockerignore.

model: moonshotai/Kimi-K2.6 · 12 LLM turns · 19 tool calls · 345.6s · 155212 in / 31496 out tokens

@@ -0,0 +1,14 @@
{{- define "name" -}}
{{- default $.Release.Name | trunc 63 | trimSuffix "-" -}}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is called with only one argument here, so it effectively does nothing. The typical pattern is default .Chart.Name .Release.Name | trunc 63 | trimSuffix "-" so the chart name is used as a fallback when the release name is empty.

Suggested change
{{- default $.Release.Name | trunc 63 | trimSuffix "-" -}}
{{- default .Chart.Name .Release.Name | trunc 63 | trimSuffix "-" -}}

annotations:
checksum/config: {{ include (print $.Template.BasePath "/config.yaml") . | sha256sum }}
spec:
{{- if .Values.serviceAccount.create }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition only sets serviceAccountName when Helm is asked to create a ServiceAccount. If a user wants to use an existing SA (a common security requirement), they would set serviceAccount.create: false and serviceAccount.name: "my-sa", but the Deployment would silently ignore it and use the namespace's default SA instead.

Change the guard to also check whether a name was explicitly provided:

Suggested change
{{- if .Values.serviceAccount.create }}
{{- if or .Values.serviceAccount.create .Values.serviceAccount.name }}

Comment thread Dockerfile
&& rm -rf /var/lib/apt/lists/*

# Unprivileged service user, mirroring ec2-user on the real host.
RUN useradd --create-home --shell /bin/bash app

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The app user is created without an explicit UID/GID. The Helm chart hard-codes runAsUser: 1000, runAsGroup: 1000, and fsGroup: 1000 in podSecurityContext. If the image ever builds with app mapped to a different UID (e.g., 1001), volume permissions and process identity will mismatch, likely causing the SQLite store to be unreadable/unwritable.

Suggested change
RUN useradd --create-home --shell /bin/bash app
RUN useradd --create-home --shell /bin/bash --uid 1000 --gid 1000 app

Comment thread deploy/helm/values.yaml
image:
repository: registry.internal.huggingface.tech/serge
name: serge
tag: latest

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting the image tag to latest with pullPolicy: IfNotPresent means nodes that have already cached a latest image will never pull updates. This leads to silent version skew and makes rollbacks/rollouts non-deterministic. Consider defaulting to a concrete tag (e.g., 0.1.0) or using pullPolicy: Always when latest is in use.

Make serge deployable to the hub-prod cluster (hub-utils namespace):

- store: optional Postgres backend selected by a postgres:// DATABASE_URL
  so the hosted deployment keeps job state off the ephemeral pod
  filesystem; SQLite remains the default. Quotes the reserved "user"
  column and rewrites ?->%s for psycopg. Adds psycopg to the web extra.
- Dockerfile: production image for reviewbot-web (uvicorn:8080 + bwrap).
- CI: build/push to the internal registry and trigger infra-deployment.
- chart/: Helm chart (deployment/service/ingress/InfisicalSecret),
  internet-facing ALB on serge.huggingface.co, secrets via Infisical.
Pivot to self-hosted deployment: the team will `helm install` the chart on
their own cluster instead of ArgoCD on hub-prod.

- Revert the Postgres job store back to SQLite (no app-code change); the
  embedded DB now persists on a PersistentVolumeClaim in the chart.
- Make the chart generic: drop the Infisical CR and the hub-specific
  env/prod.yaml; sensitive env comes from a pre-created Secret
  (existingSecret); ingress/nodeSelector/storageClass/securityContext are
  values-driven. Add the PVC and a hardened securityContext.
- CI: drop the ArgoCD/infra-deployment dispatch; keep build-and-push.
@tarekziade tarekziade force-pushed the feat/hub-prod-deploy branch from 59036ee to 995671c Compare June 18, 2026 15:20
@tarekziade tarekziade self-assigned this Jun 18, 2026
@tarekziade

Copy link
Copy Markdown
Collaborator

@askserge review this helm deployment

@tarekziade tarekziade marked this pull request as ready for review June 18, 2026 15:34
@tarekziade tarekziade merged commit 2077340 into main Jun 18, 2026
2 checks passed
@tarekziade tarekziade deleted the feat/hub-prod-deploy branch June 18, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants