feat: Helm chart + image for self-hosted deployment#22
Conversation
|
@askserge review this helm chart |
There was a problem hiding this comment.
🤗 Serge says:
The Helm chart is well-structured and follows good practices for a single-replica SQLite-backed deployment. However, there are a few correctness issues that should be fixed before merging.
Correctness
chart/templates/_helpers.tpl: Thenamehelper callsdefaultwith only one argument ($.Release.Name), which makes thedefaultfunction a no-op. It should provide an actual fallback value (e.g.,.Chart.Name) so the helper behaves as intended.chart/templates/deployment.yaml: TheserviceAccountNameis only injected when.Values.serviceAccount.createistrue. If a user setscreate: falsebut provides an existing SA name (e.g.,serviceAccount.name: "my-sa"), the Deployment silently ignores it and falls back to the namespace'sdefaultSA. The condition should also check whetherserviceAccount.nameis non-empty.Dockerfile/chart/values.yamlUID mismatch: The Dockerfile creates theappuser without an explicit UID/GID, but the HelmpodSecurityContexthard-codesrunAsUser: 1000,runAsGroup: 1000, andfsGroup: 1000. If the image'sappuser happens to get a different UID (e.g., 1001), the container will run as UID 1000 but the filesystem ownership will not match, potentially breaking volume writes. The Dockerfile should pin the user to--uid 1000 --gid 1000.
Reliability / Operations
chart/values.yaml: Defaultingimage.tagtolatestwithpullPolicy: IfNotPresentis a Kubernetes anti-pattern. Nodes that already have alatestimage cached will never pull updates, leading to version skew across the cluster. Consider defaulting to a stable tag (e.g.,0.1.0) or switchingpullPolicytoAlwayswhenlatestis used.
Security
- No hardcoded secrets; sensitive configuration is correctly off-loaded to
existingSecret. ThesecurityContexthardening (drop ALL capabilities,RuntimeDefaultseccomp, non-root) is solid.
Minor
- The
.dockerignorefile was not updated in this PR. The productionDockerfilewill currently copy the entire repo (includingchart/,aws/,docs/,tests/,.github/) into the image, bloating the layer unnecessarily. Consider adding those directories to.dockerignore.
model: moonshotai/Kimi-K2.6 · 12 LLM turns · 19 tool calls · 345.6s · 155212 in / 31496 out tokens
| @@ -0,0 +1,14 @@ | |||
| {{- define "name" -}} | |||
| {{- default $.Release.Name | trunc 63 | trimSuffix "-" -}} | |||
There was a problem hiding this comment.
default is called with only one argument here, so it effectively does nothing. The typical pattern is default .Chart.Name .Release.Name | trunc 63 | trimSuffix "-" so the chart name is used as a fallback when the release name is empty.
| {{- default $.Release.Name | trunc 63 | trimSuffix "-" -}} | |
| {{- default .Chart.Name .Release.Name | trunc 63 | trimSuffix "-" -}} |
| annotations: | ||
| checksum/config: {{ include (print $.Template.BasePath "/config.yaml") . | sha256sum }} | ||
| spec: | ||
| {{- if .Values.serviceAccount.create }} |
There was a problem hiding this comment.
This condition only sets serviceAccountName when Helm is asked to create a ServiceAccount. If a user wants to use an existing SA (a common security requirement), they would set serviceAccount.create: false and serviceAccount.name: "my-sa", but the Deployment would silently ignore it and use the namespace's default SA instead.
Change the guard to also check whether a name was explicitly provided:
| {{- if .Values.serviceAccount.create }} | |
| {{- if or .Values.serviceAccount.create .Values.serviceAccount.name }} |
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Unprivileged service user, mirroring ec2-user on the real host. | ||
| RUN useradd --create-home --shell /bin/bash app |
There was a problem hiding this comment.
The app user is created without an explicit UID/GID. The Helm chart hard-codes runAsUser: 1000, runAsGroup: 1000, and fsGroup: 1000 in podSecurityContext. If the image ever builds with app mapped to a different UID (e.g., 1001), volume permissions and process identity will mismatch, likely causing the SQLite store to be unreadable/unwritable.
| RUN useradd --create-home --shell /bin/bash app | |
| RUN useradd --create-home --shell /bin/bash --uid 1000 --gid 1000 app |
| image: | ||
| repository: registry.internal.huggingface.tech/serge | ||
| name: serge | ||
| tag: latest |
There was a problem hiding this comment.
Defaulting the image tag to latest with pullPolicy: IfNotPresent means nodes that have already cached a latest image will never pull updates. This leads to silent version skew and makes rollbacks/rollouts non-deterministic. Consider defaulting to a concrete tag (e.g., 0.1.0) or using pullPolicy: Always when latest is in use.
Make serge deployable to the hub-prod cluster (hub-utils namespace): - store: optional Postgres backend selected by a postgres:// DATABASE_URL so the hosted deployment keeps job state off the ephemeral pod filesystem; SQLite remains the default. Quotes the reserved "user" column and rewrites ?->%s for psycopg. Adds psycopg to the web extra. - Dockerfile: production image for reviewbot-web (uvicorn:8080 + bwrap). - CI: build/push to the internal registry and trigger infra-deployment. - chart/: Helm chart (deployment/service/ingress/InfisicalSecret), internet-facing ALB on serge.huggingface.co, secrets via Infisical.
Pivot to self-hosted deployment: the team will `helm install` the chart on their own cluster instead of ArgoCD on hub-prod. - Revert the Postgres job store back to SQLite (no app-code change); the embedded DB now persists on a PersistentVolumeClaim in the chart. - Make the chart generic: drop the Infisical CR and the hub-specific env/prod.yaml; sensitive env comes from a pre-created Secret (existingSecret); ingress/nodeSelector/storageClass/securityContext are values-driven. Add the PVC and a hardened securityContext. - CI: drop the ArgoCD/infra-deployment dispatch; keep build-and-push.
59036ee to
995671c
Compare
|
@askserge review this helm deployment |
Packages serge's web app (
reviewbot-web) so a team can deploy it themselves withhelm installagainst their own cluster. No app-code change — SQLite remains the job store, now persisted on a PersistentVolumeClaim.Contents
chart/— generic Helm chart:Recreatestrategy — SQLite is single-writer), Service, optional Ingress, ConfigMap, optional ServiceAccount.persistence.mountPath(default/var/lib/reviewbot);WEB_STORE_PATHpoints at<mountPath>/jobs.dbso review/task history survives restarts.securityContext(non-root,fsGroupso the app user can write the volume, dropped capabilities, seccompRuntimeDefault).nodeSelector/tolerations/affinity,storageClass, image registry andresourcesare all values-driven. Sensitive env is injected from a pre-created Secret referenced byexistingSecret.Dockerfile— production image forreviewbot-web(uvicorn on$PORT, default 8080; bubblewrap forHELPER_SANDBOX).Deploy
GITHUB_APP_ID,GITHUB_PRIVATE_KEY,GITHUB_WEBHOOK_SECRET,GITHUB_OAUTH_CLIENT_ID,GITHUB_OAUTH_CLIENT_SECRET,WEB_SESSION_SECRET,LLM_API_KEY, …) in the target namespace.helm install serge ./chart -n <ns> --set existingSecret=<secret-name> --set ingress.enabled=true --set ingress.host=<host> --set ingress.className=<class>Notes
helm lint+helm templatepass; image builds and boots (serves/healthz).HELPER_SANDBOX(bubblewrap) needs the cluster to allow unprivileged user namespaces; set it toauto/offviaenvVarsif not available.