Problem Statement
Every container image used as an OpenShell sandbox must bake in a sandbox user and group at a specific UID/GID. The bring-your-own-container docs require UID 1000660000, the VM driver hardcodes UID 10001, and the policy engine rejects any run_as_user value other than "sandbox". This creates friction for image authors, prevents compatibility with environments that allocate their own UID ranges (e.g., OpenShift SCCs), and ignores the Kubernetes-native securityContext mechanism for injecting user identity at runtime.
The goal is to eliminate the requirement that the sandbox user exist in the container image and instead have the compute driver inject the desired UID/GID at sandbox creation time. On OpenShift, the UID should be auto-detected from namespace SCC annotations.
Technical Context
The sandbox user identity flows through 6 distinct layers today — from image build time, through policy validation, to supervisor privilege dropping. Each layer assumes a user named "sandbox" exists in the container's /etc/passwd. The supervisor starts as root (UID 0) and drops privileges to the sandbox user via setuid()/setgid() after setting up network namespaces, Landlock, and seccomp. The Kubernetes driver already constructs securityContext on the pod spec (setting runAsUser: 0 for the supervisor), so extending it to pass the sandbox UID is architecturally straightforward. The proto already uses a string field for run_as_user, so numeric UIDs require no wire format change.
Affected Components
| Component |
Key Files |
Role |
| Policy engine |
crates/openshell-policy/src/lib.rs |
Validates run_as_user/run_as_group, currently rejects anything other than "sandbox" |
| Supervisor (process) |
crates/openshell-supervisor-process/src/process.rs |
Validates sandbox user exists, drops privileges via setuid()/setgid(), chowns filesystem |
| Supervisor (SSH) |
crates/openshell-supervisor-process/src/ssh.rs |
Derives USER/HOME env vars from policy user |
| Kubernetes driver |
crates/openshell-driver-kubernetes/src/driver.rs |
Constructs pod spec, sets securityContext, manages PVC init containers |
| VM driver |
crates/openshell-driver-vm/src/rootfs.rs |
Writes sandbox user into guest rootfs at image prep time |
| Docker driver |
crates/openshell-driver-docker/src/lib.rs |
Uses --user on docker run |
| Proto |
proto/sandbox.proto |
Defines ProcessPolicy.run_as_user as string (no change needed) |
| BYOC example |
examples/bring-your-own-container/ |
Documents and demonstrates sandbox user creation |
Technical Investigation
Architecture Overview
The sandbox user identity is established at image build time and consumed at 6 points during sandbox lifecycle:
- Image build —
groupadd/useradd creates the sandbox user at a fixed UID in the Dockerfile
- Policy normalization —
ensure_sandbox_process_identity() defaults empty run_as_user/run_as_group to "sandbox"
- Policy validation —
validate_sandbox_policy() hard-rejects any non-"sandbox" value
- Supervisor user validation —
validate_sandbox_user() calls User::from_name("sandbox") against /etc/passwd
- Privilege dropping —
drop_privileges() resolves "sandbox" to a UID via User::from_name(), then calls setgid()/setuid() with verification
- Filesystem prep —
prepare_filesystem() resolves the sandbox user for chown of read_write directories
The supervisor runs as root (UID 0) to create network namespaces, set up the proxy, and configure Landlock/seccomp. It drops to the sandbox UID only for child processes. The Kubernetes driver forces securityContext.runAsUser = 0 on the main container for this reason.
Code References
| Location |
Description |
openshell-policy/src/lib.rs:660-668 |
ensure_sandbox_process_identity() — defaults empty user/group to "sandbox" |
openshell-policy/src/lib.rs:756-772 |
validate_sandbox_policy() — hard-rejects non-"sandbox" values for run_as_user/run_as_group |
openshell-policy/src/lib.rs:680-697 |
PolicyViolation enum — would need new UidOutOfRange variant |
openshell-supervisor-process/src/process.rs:758-786 |
validate_sandbox_user() — calls User::from_name("sandbox"), fails if missing from image |
openshell-supervisor-process/src/process.rs:892-998 |
drop_privileges() — resolves name → UID via User::from_name(), calls setgid()/setuid() with verification |
openshell-supervisor-process/src/process.rs:788-870 |
prepare_filesystem() — resolves sandbox user/group for chown of read_write directories |
openshell-supervisor-process/src/ssh.rs:221-225 |
SSH session — derives USER/HOME from policy run_as_user, defaults to "sandbox"/"/sandbox" |
openshell-driver-kubernetes/src/driver.rs:970-981 |
K8s driver — forces securityContext.runAsUser = 0 on supervisor container |
openshell-driver-kubernetes/src/driver.rs:994+ |
PVC workspace init container — seeds PVCs, needs sandbox UID for chown |
openshell-driver-vm/src/rootfs.rs:755-772 |
VM driver — hardcodes SANDBOX_UID = 10001 / SANDBOX_GID = 10001 in rootfs |
proto/sandbox.proto:47-52 |
ProcessPolicy — run_as_user and run_as_group are string fields |
examples/bring-your-own-container/Dockerfile:20-21 |
BYOC example — groupadd -g 1000660000 sandbox && useradd -m -u 1000660000 -g sandbox sandbox |
e2e/rust/tests/custom_image.rs:27-28 |
E2E test image — same 1000660000 pattern |
Current Behavior
When a sandbox is created:
- The policy's
run_as_user is defaulted to "sandbox" if empty, then validated — only "sandbox" is accepted.
- The supervisor calls
User::from_name("sandbox") against the container's /etc/passwd. If the user doesn't exist, startup fails with: "sandbox user 'sandbox' not found in image; all sandbox images must include a 'sandbox' user and group".
drop_privileges() resolves "sandbox" → numeric UID via User::from_name(), then calls setgid()/setuid() with post-drop verification (defense-in-depth: confirms UID changed, confirms root can't be re-acquired).
prepare_filesystem() resolves the sandbox user for chown of read_write directories before forking the child process.
What Would Need to Change
Policy engine — Relax the hard "sandbox" string check to also accept numeric UID strings within a platform-level range:
const MIN_SANDBOX_UID: u32 = 1000;
const MAX_SANDBOX_UID: u32 = 2_000_000_000;
Accept "sandbox" (existing) or any u32 in [MIN_SANDBOX_UID, MAX_SANDBOX_UID]. Reject "root", UID 0, system UIDs below 1000, and non-numeric garbage. Add UidOutOfRange violation variant for clear error messages. The range is a platform safety constant, not a per-policy knob.
Supervisor — validate_sandbox_user(), drop_privileges(), and prepare_filesystem() must accept numeric UIDs:
- If the value parses as
u32, skip /etc/passwd lookup and use the UID directly (setuid()/setgid() do not require a passwd entry).
- If it's a name, keep the existing name-based lookup.
- SSH session should derive
USER=sandbox and HOME=/sandbox as defaults when no passwd entry exists.
Kubernetes driver — Add sandbox_uid/sandbox_gid to driver config. Pass the UID to the supervisor through the policy's run_as_user field. The supervisor container stays runAsUser: 0. The PVC init container uses the injected UID for chown.
OpenShift SCC-aware UID resolution — On OpenShift, read namespace annotations to auto-select the sandbox UID:
- Read namespace metadata via
Api<Namespace>::get() (driver doesn't currently do this — requires adding the call).
- Parse
openshift.io/sa.scc.uid-range annotation (format: <start>/<size>, e.g., 1000660000/10000). Use range start as sandbox UID.
- Parse
openshift.io/sa.scc.supplemental-groups for GID. Fall back to UID range start if absent.
- If neither annotation is present (vanilla Kubernetes), fall back to configured
sandbox_uid/sandbox_gid.
- Validate resolved UID/GID against
[MIN_SANDBOX_UID, MAX_SANDBOX_UID].
This is passive detection (annotation presence) — no explicit "OpenShift mode" config flag needed.
VM driver — Use configurable UID instead of hardcoded 10001. The VM driver controls its own rootfs, so it can continue creating the user at rootfs prep time.
BYOC / docs — Remove groupadd/useradd requirement from examples and documentation.
Alternative Approaches Considered
-
NSS module in the supervisor — Synthesize a sandbox passwd entry via custom NSS module. Adds runtime dependency and complexity. Rejected: directly using numeric UIDs is simpler and more portable.
-
Init container running useradd — Create the user at container start. Requires the image to have user management tools and writable /etc/passwd. Rejected: many minimal images lack these tools.
-
Better documentation only — Just improve the BYOC docs. Doesn't solve the underlying friction or OpenShift UID range incompatibility.
Patterns to Follow
- The Kubernetes driver already constructs
securityContext on pod specs (driver.rs:970-981) — the sandbox UID injection follows the same JSON manipulation pattern.
- The policy engine already has a
PolicyViolation enum with descriptive variants and Display impls — the new UidOutOfRange variant should follow the same pattern.
- The supervisor's
drop_privileges() already has defense-in-depth verification (confirms UID changed, confirms root can't be re-acquired) — numeric UID support must maintain these checks.
Proposed Approach
Implement in three phases. Phase 1 teaches the policy engine and supervisor to accept numeric UIDs within a safe range ([1000, 2_000_000_000]), removing the hard dependency on a /etc/passwd entry. Phase 2 adds sandbox_uid/sandbox_gid config to the Kubernetes driver and injects it via the policy, with passive OpenShift SCC annotation detection for automatic UID selection on OpenShift clusters. Phase 3 removes the image-side user requirement from examples, docs, and e2e tests.
Scope Assessment
- Complexity: Medium
- Confidence: High — clear path for Phases 1-2, Phase 3 is straightforward cleanup
- Estimated files to change: ~11
- Issue type:
feat
Risks & Open Questions
-
Programs requiring a passwd entry — Some programs (sudo, ssh) fail if the running UID has no /etc/passwd entry. Should the supervisor write a synthetic passwd entry at startup before dropping privileges?
-
Home directory creation — If the image doesn't have /sandbox created, who creates it? The supervisor could create and chown it during prepare_filesystem(), but this needs to happen before the child process starts.
-
File ownership in image layers — Files in the image owned by the old sandbox UID will appear as owned by a different user. Only affects images previously built with the sandbox user.
-
Security boundary — The range check (MIN_SANDBOX_UID = 1000 through MAX_SANDBOX_UID = 2_000_000_000) replaces the current "sandbox" string check as the non-root invariant. It rejects UID 0, system UIDs below 1000, and unreasonably large values. The range is enforced as platform-level constants, not per-policy configurable.
-
Docker driver — The Docker driver uses --user on docker run. Should it also adopt the configurable UID pattern, or is this Kubernetes-only initially?
-
Gateway config docs — If sandbox_uid/sandbox_gid are added to the Kubernetes driver config, docs/reference/gateway-config.mdx and relevant compute-driver setup docs must be updated.
Test Considerations
- Unit tests — Policy validation tests must cover:
"sandbox" (pass), numeric UID in range (pass), numeric UID out of range (fail), "root" (fail), "0" (fail), non-numeric string (fail). Supervisor tests must cover numeric UID privilege dropping without passwd entry.
- Integration tests — Kubernetes driver tests must verify pod spec includes correct
securityContext when sandbox_uid is configured, and that OpenShift annotation parsing works correctly.
- E2E tests — Update
e2e/rust/tests/custom_image.rs to use an image without a baked-in sandbox user. Verify sandbox creation, privilege dropping, and filesystem ownership work with injected UIDs.
- Existing test patterns —
openshell-driver-kubernetes/src/driver.rs has tests like supervisor_sideload_injects_run_as_user_zero() that verify securityContext — new tests should follow this pattern. Policy validation tests in openshell-policy/src/lib.rs use validate_sandbox_policy() assertions — extend these.
Created by spike investigation. Use build-from-issue to plan and implement.
Problem Statement
Every container image used as an OpenShell sandbox must bake in a
sandboxuser and group at a specific UID/GID. The bring-your-own-container docs require UID1000660000, the VM driver hardcodes UID10001, and the policy engine rejects anyrun_as_uservalue other than"sandbox". This creates friction for image authors, prevents compatibility with environments that allocate their own UID ranges (e.g., OpenShift SCCs), and ignores the Kubernetes-nativesecurityContextmechanism for injecting user identity at runtime.The goal is to eliminate the requirement that the sandbox user exist in the container image and instead have the compute driver inject the desired UID/GID at sandbox creation time. On OpenShift, the UID should be auto-detected from namespace SCC annotations.
Technical Context
The sandbox user identity flows through 6 distinct layers today — from image build time, through policy validation, to supervisor privilege dropping. Each layer assumes a user named
"sandbox"exists in the container's/etc/passwd. The supervisor starts as root (UID 0) and drops privileges to the sandbox user viasetuid()/setgid()after setting up network namespaces, Landlock, and seccomp. The Kubernetes driver already constructssecurityContexton the pod spec (settingrunAsUser: 0for the supervisor), so extending it to pass the sandbox UID is architecturally straightforward. The proto already uses astringfield forrun_as_user, so numeric UIDs require no wire format change.Affected Components
crates/openshell-policy/src/lib.rsrun_as_user/run_as_group, currently rejects anything other than"sandbox"crates/openshell-supervisor-process/src/process.rssetuid()/setgid(), chowns filesystemcrates/openshell-supervisor-process/src/ssh.rsUSER/HOMEenv vars from policy usercrates/openshell-driver-kubernetes/src/driver.rssecurityContext, manages PVC init containerscrates/openshell-driver-vm/src/rootfs.rscrates/openshell-driver-docker/src/lib.rs--userondocker runproto/sandbox.protoProcessPolicy.run_as_useras string (no change needed)examples/bring-your-own-container/Technical Investigation
Architecture Overview
The sandbox user identity is established at image build time and consumed at 6 points during sandbox lifecycle:
groupadd/useraddcreates thesandboxuser at a fixed UID in the Dockerfileensure_sandbox_process_identity()defaults emptyrun_as_user/run_as_groupto"sandbox"validate_sandbox_policy()hard-rejects any non-"sandbox"valuevalidate_sandbox_user()callsUser::from_name("sandbox")against/etc/passwddrop_privileges()resolves"sandbox"to a UID viaUser::from_name(), then callssetgid()/setuid()with verificationprepare_filesystem()resolves the sandbox user forchownofread_writedirectoriesThe supervisor runs as root (UID 0) to create network namespaces, set up the proxy, and configure Landlock/seccomp. It drops to the sandbox UID only for child processes. The Kubernetes driver forces
securityContext.runAsUser = 0on the main container for this reason.Code References
openshell-policy/src/lib.rs:660-668ensure_sandbox_process_identity()— defaults empty user/group to"sandbox"openshell-policy/src/lib.rs:756-772validate_sandbox_policy()— hard-rejects non-"sandbox"values forrun_as_user/run_as_groupopenshell-policy/src/lib.rs:680-697PolicyViolationenum — would need newUidOutOfRangevariantopenshell-supervisor-process/src/process.rs:758-786validate_sandbox_user()— callsUser::from_name("sandbox"), fails if missing from imageopenshell-supervisor-process/src/process.rs:892-998drop_privileges()— resolves name → UID viaUser::from_name(), callssetgid()/setuid()with verificationopenshell-supervisor-process/src/process.rs:788-870prepare_filesystem()— resolves sandbox user/group forchownofread_writedirectoriesopenshell-supervisor-process/src/ssh.rs:221-225USER/HOMEfrom policyrun_as_user, defaults to"sandbox"/"/sandbox"openshell-driver-kubernetes/src/driver.rs:970-981securityContext.runAsUser = 0on supervisor containeropenshell-driver-kubernetes/src/driver.rs:994+openshell-driver-vm/src/rootfs.rs:755-772SANDBOX_UID = 10001/SANDBOX_GID = 10001in rootfsproto/sandbox.proto:47-52ProcessPolicy—run_as_userandrun_as_grouparestringfieldsexamples/bring-your-own-container/Dockerfile:20-21groupadd -g 1000660000 sandbox && useradd -m -u 1000660000 -g sandbox sandboxe2e/rust/tests/custom_image.rs:27-281000660000patternCurrent Behavior
When a sandbox is created:
run_as_useris defaulted to"sandbox"if empty, then validated — only"sandbox"is accepted.User::from_name("sandbox")against the container's/etc/passwd. If the user doesn't exist, startup fails with: "sandbox user 'sandbox' not found in image; all sandbox images must include a 'sandbox' user and group".drop_privileges()resolves"sandbox"→ numeric UID viaUser::from_name(), then callssetgid()/setuid()with post-drop verification (defense-in-depth: confirms UID changed, confirms root can't be re-acquired).prepare_filesystem()resolves the sandbox user forchownofread_writedirectories before forking the child process.What Would Need to Change
Policy engine — Relax the hard
"sandbox"string check to also accept numeric UID strings within a platform-level range:Accept
"sandbox"(existing) or anyu32in[MIN_SANDBOX_UID, MAX_SANDBOX_UID]. Reject"root", UID 0, system UIDs below 1000, and non-numeric garbage. AddUidOutOfRangeviolation variant for clear error messages. The range is a platform safety constant, not a per-policy knob.Supervisor —
validate_sandbox_user(),drop_privileges(), andprepare_filesystem()must accept numeric UIDs:u32, skip/etc/passwdlookup and use the UID directly (setuid()/setgid()do not require a passwd entry).USER=sandboxandHOME=/sandboxas defaults when no passwd entry exists.Kubernetes driver — Add
sandbox_uid/sandbox_gidto driver config. Pass the UID to the supervisor through the policy'srun_as_userfield. The supervisor container staysrunAsUser: 0. The PVC init container uses the injected UID for chown.OpenShift SCC-aware UID resolution — On OpenShift, read namespace annotations to auto-select the sandbox UID:
Api<Namespace>::get()(driver doesn't currently do this — requires adding the call).openshift.io/sa.scc.uid-rangeannotation (format:<start>/<size>, e.g.,1000660000/10000). Use range start as sandbox UID.openshift.io/sa.scc.supplemental-groupsfor GID. Fall back to UID range start if absent.sandbox_uid/sandbox_gid.[MIN_SANDBOX_UID, MAX_SANDBOX_UID].This is passive detection (annotation presence) — no explicit "OpenShift mode" config flag needed.
VM driver — Use configurable UID instead of hardcoded
10001. The VM driver controls its own rootfs, so it can continue creating the user at rootfs prep time.BYOC / docs — Remove
groupadd/useraddrequirement from examples and documentation.Alternative Approaches Considered
NSS module in the supervisor — Synthesize a
sandboxpasswd entry via custom NSS module. Adds runtime dependency and complexity. Rejected: directly using numeric UIDs is simpler and more portable.Init container running
useradd— Create the user at container start. Requires the image to have user management tools and writable/etc/passwd. Rejected: many minimal images lack these tools.Better documentation only — Just improve the BYOC docs. Doesn't solve the underlying friction or OpenShift UID range incompatibility.
Patterns to Follow
securityContexton pod specs (driver.rs:970-981) — the sandbox UID injection follows the same JSON manipulation pattern.PolicyViolationenum with descriptive variants andDisplayimpls — the newUidOutOfRangevariant should follow the same pattern.drop_privileges()already has defense-in-depth verification (confirms UID changed, confirms root can't be re-acquired) — numeric UID support must maintain these checks.Proposed Approach
Implement in three phases. Phase 1 teaches the policy engine and supervisor to accept numeric UIDs within a safe range (
[1000, 2_000_000_000]), removing the hard dependency on a/etc/passwdentry. Phase 2 addssandbox_uid/sandbox_gidconfig to the Kubernetes driver and injects it via the policy, with passive OpenShift SCC annotation detection for automatic UID selection on OpenShift clusters. Phase 3 removes the image-side user requirement from examples, docs, and e2e tests.Scope Assessment
featRisks & Open Questions
Programs requiring a passwd entry — Some programs (
sudo,ssh) fail if the running UID has no/etc/passwdentry. Should the supervisor write a synthetic passwd entry at startup before dropping privileges?Home directory creation — If the image doesn't have
/sandboxcreated, who creates it? The supervisor could create and chown it duringprepare_filesystem(), but this needs to happen before the child process starts.File ownership in image layers — Files in the image owned by the old sandbox UID will appear as owned by a different user. Only affects images previously built with the sandbox user.
Security boundary — The range check (
MIN_SANDBOX_UID = 1000throughMAX_SANDBOX_UID = 2_000_000_000) replaces the current"sandbox"string check as the non-root invariant. It rejects UID 0, system UIDs below 1000, and unreasonably large values. The range is enforced as platform-level constants, not per-policy configurable.Docker driver — The Docker driver uses
--userondocker run. Should it also adopt the configurable UID pattern, or is this Kubernetes-only initially?Gateway config docs — If
sandbox_uid/sandbox_gidare added to the Kubernetes driver config,docs/reference/gateway-config.mdxand relevant compute-driver setup docs must be updated.Test Considerations
"sandbox"(pass), numeric UID in range (pass), numeric UID out of range (fail),"root"(fail),"0"(fail), non-numeric string (fail). Supervisor tests must cover numeric UID privilege dropping without passwd entry.securityContextwhensandbox_uidis configured, and that OpenShift annotation parsing works correctly.e2e/rust/tests/custom_image.rsto use an image without a baked-in sandbox user. Verify sandbox creation, privilege dropping, and filesystem ownership work with injected UIDs.openshell-driver-kubernetes/src/driver.rshas tests likesupervisor_sideload_injects_run_as_user_zero()that verifysecurityContext— new tests should follow this pattern. Policy validation tests inopenshell-policy/src/lib.rsusevalidate_sandbox_policy()assertions — extend these.Created by spike investigation. Use
build-from-issueto plan and implement.