Skip to content

Conversation

@jiridanek
Copy link
Member

@jiridanek jiridanek commented Nov 10, 2025

Description

https://issues.redhat.com/browse/AIPCC-6903

How Has This Been Tested?

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Chores
    • Improved container build flow for Jupyter and ROCm images to ensure installation steps run with appropriate privileges and final runtime uses a non-root context.
    • Better permission handling for site-packages and app files for OpenShift compatibility and more reliable mounts.
    • Disabled JupyterLab announcements, updated Python kernel launcher, applied JupyterLab addons, and ensured configuration is placed correctly across PyTorch, TensorFlow, and CUDA variants.

…nt for AIPCC way of setting permissions in rocm base (opendatahub-io#1682)

(cherry picked from commit 25f1b89)
@github-actions github-actions bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Nov 10, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 10, 2025

Walkthrough

Adds explicit root (USER 0) execution blocks around package installation and setup in multiple Dockerfiles, then switches back to non-root (USER 1001); includes permission adjustments, Jupyter configuration steps, and WORKDIR updates in Jupyter and runtime ROCm/CUDA images.

Changes

Cohort / File(s) Summary
Jupyter ROCm Minimal
jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm
Adds a root-run installation block (USER 0) that installs Python deps via uv/pip, disables JupyterLab announcements, updates the Jupyter kernel launcher, copies Jupyter config, applies JupyterLab addons, adjusts site-packages group-writability, runs fix-permissions, then restores USER 1001 and sets WORKDIR.
Jupyter ROCm PyTorch
jupyter/rocm/pytorch/ubi9-python-3.12/Dockerfile.rocm
Wraps install steps in USER 0 with uv/pip install (strict/no-deps), creates runtimes dir, removes default Elyra runtimes, adjusts kernel launcher, copies config, disables announcements, applies addons, de-vendors ROCm libs, fixes permissions, then switches to USER 1001 and sets WORKDIR.
Jupyter ROCm TensorFlow
jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm
Introduces USER 0 before installation/link-solibs steps and re-establishes USER 1001 before final WORKDIR; shifts root-privileged operations into an explicit root block.
Runtime ROCm PyTorch & TensorFlow
runtimes/rocm-pytorch/ubi9-python-3.12/Dockerfile.rocm, runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm
Adds USER 0 around Python package installation RUN, then switches to USER 1001 and sets WORKDIR to /opt/app-root/src, changing ownership/context for installed artifacts.
Runtime TensorFlow CUDA
runtimes/tensorflow/ubi9-python-3.12/Dockerfile.cuda
Inserts USER 0 before initial package and pip/uv install steps, then returns to USER 1001 and moves the final WORKDIR change after user context switches.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Files affected: several Dockerfiles with a consistent, repetitive pattern of adding USER 0/USER 1001 blocks and related permission/configuration commands.
  • Pay special attention to:
    • Correct pairing and ordering of USER 0 and USER 1001 transitions.
    • Placement and effects of permission changes (chmod, fix-permissions) on runtime directories and Python site-packages.
    • WORKDIR placement relative to user switches to avoid permission mismatches.
    • Any hardcoded paths or operations that assume a particular UID/GID after the switch.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is mostly incomplete with placeholder sections and critical merge criteria unchecked despite self-checklist items being marked as completed. Complete all sections: add detailed description of changes, provide comprehensive testing steps and environment details, check all merge criteria boxes, and verify commits are squashed with meaningful messages.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: fixing permissions in ROCm images as a workaround for AIPCC's permission-setting approach, which directly aligns with the summary showing USER context changes and permission adjustments across multiple Dockerfiles.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from atheo89 and daniellutz November 10, 2025 17:07
@openshift-ci openshift-ci bot added size/s and removed size/s labels Nov 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bccfcb6 and 12fef91.

📒 Files selected for processing (6)
  • jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm (2 hunks)
  • jupyter/rocm/pytorch/ubi9-python-3.12/Dockerfile.rocm (2 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm (2 hunks)
  • runtimes/rocm-pytorch/ubi9-python-3.12/Dockerfile.rocm (2 hunks)
  • runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm (2 hunks)
  • runtimes/tensorflow/ubi9-python-3.12/Dockerfile.cuda (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-minimal-cpu-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-minimal-cuda-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-pipeline-runtime-tensorflow-cuda-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-pipeline-runtime-pytorch-rocm-py312-ubi9-on-pull-request
  • GitHub Check: build (jupyter-minimal-ubi9-python-3.12, 3.12, linux/ppc64le, false) / build
  • GitHub Check: build (rocm-jupyter-minimal-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (runtime-cuda-tensorflow-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (rocm-jupyter-pytorch-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (cuda-jupyter-minimal-ubi9-python-3.12, 3.12, linux/arm64, false) / build
  • GitHub Check: build (runtime-cuda-tensorflow-ubi9-python-3.12, 3.12, linux/arm64, false) / build
  • GitHub Check: build (rocm-jupyter-tensorflow-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (jupyter-minimal-ubi9-python-3.12, 3.12, linux/s390x, false) / build
  • GitHub Check: build (jupyter-minimal-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (cuda-jupyter-minimal-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (rocm-runtime-pytorch-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (rocm-runtime-tensorflow-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-tensorflow-rocm-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-pytorch-rocm-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-minimal-rocm-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-pipeline-runtime-tensorflow-rocm-py312-ubi9-on-pull-request
🔇 Additional comments (5)
runtimes/rocm-pytorch/ubi9-python-3.12/Dockerfile.rocm (1)

88-105: Consistent user context and permission pattern.

The changes correctly bracket the installation block with USER 0 before and USER 1001 after, with a single permission-hardening RUN block applied. This aligns with OpenShift-friendly security practices.

jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm (1)

160-192: Consistent user context and permission pattern.

The changes correctly bracket the installation block with USER 0 and switch to USER 1001 after, with a single permission-hardening step. The post-install link-solibs operation executes under non-root context.

runtimes/tensorflow/ubi9-python-3.12/Dockerfile.cuda (1)

91-107: Consistent user context and permission pattern.

The changes correctly manage user context with USER 0 for installation and USER 1001 for subsequent operations, applying a single permission-hardening block. Pattern is consistent across similar runtime images.

runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm (1)

87-104: Consistent user context and permission pattern.

The changes maintain consistent user context switching with a single permission-hardening block. Pattern aligns with other updated runtime images in this PR.

jupyter/rocm/pytorch/ubi9-python-3.12/Dockerfile.rocm (1)

159-188: Consistent user context with comprehensive setup block.

The changes bracket a comprehensive installation and runtime-configuration sequence under USER 0, then switch to USER 1001 for subsequent operations. Permission hardening is applied once within the root context. Pattern is consistent across the PR.

@openshift-ci openshift-ci bot added size/s and removed size/s labels Nov 11, 2025
@jiridanek
Copy link
Member Author

@coderabbitai please review

Pay special attention to:
Correct pairing and ordering of USER 0 and USER 1001 transitions.
Placement and effects of permission changes (chmod, fix-permissions) on runtime directories and Python site-packages.
WORKDIR placement relative to user switches to avoid permission mismatches.
Any hardcoded paths or operations that assume a particular UID/GID after the switch.

@jiridanek jiridanek added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 11, 2025
@openshift-ci openshift-ci bot added size/s and removed size/s labels Nov 11, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 11, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch contents of all changed Dockerfiles
echo "=== jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm ==="
cat jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm
echo -e "\n\n=== jupyter/rocm/pytorch/ubi9-python-3.12/Dockerfile.rocm ==="
cat jupyter/rocm/pytorch/ubi9-python-3.12/Dockerfile.rocm
echo -e "\n\n=== jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm ==="
cat jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm
echo -e "\n\n=== runtimes/rocm-pytorch/ubi9-python-3.12/Dockerfile.rocm ==="
cat runtimes/rocm-pytorch/ubi9-python-3.12/Dockerfile.rocm
echo -e "\n\n=== runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm ==="
cat runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm
echo -e "\n\n=== runtimes/tensorflow/ubi9-python-3.12/Dockerfile.cuda ==="
cat runtimes/tensorflow/ubi9-python-3.12/Dockerfile.cuda

Length of output: 32747


@jiridanek I've completed a comprehensive review of all 6 Dockerfiles. Here are my findings:

✅ USER Transitions (USER 0 ↔ USER 1001)

All files correctly pair and order USER directives:

Pattern observed across all files:

  1. rocm-base/cuda-base stage: USER 0 → USER 1001 ✓
  2. Final build stages: USER 0 (for installations) → USER 1001 (for runtime) ✓

Notable examples:

  • jupyter/minimal: Uses nested USER switches correctly (USER 0 for PDF deps, then USER 1001, then USER 0 for Python install, finally USER 1001)
  • jupyter/rocm/tensorflow: Properly maintains USER 0 through multiple operations (pip install, COPY files, link-solibs script) before final USER 1001
  • All transitions are explicit and complete - no missing USER directives

✅ Permission Changes (chmod, fix-permissions)

All permission operations are correctly placed:

Consistent pattern in all files:

USER 0
RUN /bin/bash <<'EOF'
  # ... package installation ...
  chmod -R g+w /opt/app-root/lib/python3.12/site-packages
  fix-permissions /opt/app-root -P
EOF
USER 1001

Key observations:

  • chmod -R g+w always runs as USER 0 before switching to USER 1001 ✓
  • fix-permissions /opt/app-root -P always runs as USER 0 ✓
  • This ensures group-writable permissions for OpenShift's arbitrary UID support
  • Permission fixes occur after all package installations in the same RUN block

✅ WORKDIR Placement

All files correctly place WORKDIR directives:

Pattern:

USER 0
RUN # ... installations and permission fixes ...
USER 1001
WORKDIR /opt/app-root/src

Files reviewed:

  1. jupyter/minimal: WORKDIR after final USER 1001 ✓
  2. jupyter/rocm/pytorch: WORKDIR after final USER 1001 ✓
  3. jupyter/rocm/tensorflow: WORKDIR after final USER 1001 ✓
  4. runtimes/rocm-pytorch: WORKDIR after final USER 1001 ✓
  5. runtimes/rocm-tensorflow: WORKDIR after final USER 1001 ✓
  6. runtimes/tensorflow (CUDA): WORKDIR after final USER 1001 ✓

This prevents permission mismatches - the working directory is set after switching to the runtime user.

✅ Hardcoded Paths and UID/GID Assumptions

All paths use standard OpenShift conventions:

Standard paths used:

  • /opt/app-root/bin - application binaries
  • /opt/app-root/lib/python3.12/site-packages - Python packages
  • /opt/app-root/src - user working directory
  • /opt/app-root/etc/jupyter - Jupyter configuration
  • /opt/app-root/share/jupyter - Jupyter shared files
  • /opt/app-root/runtimes - runtime configuration

No problematic assumptions found:

  • All paths are under /opt/app-root/* which is owned by user 1001
  • No explicit UID/GID references in commands
  • fix-permissions -P handles arbitrary UIDs correctly for OpenShift
  • Group-write permissions (g+w) support OpenShift's dynamic GID assignment

Additional Observations

Minor note on COPY placement:

  • In runtimes/rocm-tensorflow and runtimes/tensorflow, COPY operations for usercustomize.pth and monkey_patch_protobuf_6x.py occur while still USER 0, before switching to USER 1001
  • This is acceptable since fix-permissions was already run, but these files will be owned by root
  • This pattern is consistent and intentional across those images

Conclusion

All USER 0 and USER 1001 transitions are correctly paired and ordered
All permission changes are properly executed as root before switching to non-root user
All WORKDIR directives are correctly placed after USER transitions
No hardcoded paths or UID/GID assumptions that would cause permission issues

The changes successfully implement the ROCm permissions workaround while maintaining proper OpenShift security practices. The implementation is consistent across all 6 Dockerfiles.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 11, 2025

@jiridanek: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebook-rocm-jupyter-pt-ubi9-python-3-12-pr-image-mirror 73c7cc1 link true /test notebook-rocm-jupyter-pt-ubi9-python-3-12-pr-image-mirror
ci/prow/runtime-cuda-tf-ubi9-python-3-12-pr-image-mirror 73c7cc1 link true /test runtime-cuda-tf-ubi9-python-3-12-pr-image-mirror
ci/prow/notebook-rocm-jupyter-ubi9-python-3-12-pr-image-mirror 73c7cc1 link true /test notebook-rocm-jupyter-ubi9-python-3-12-pr-image-mirror
ci/prow/rocm-runtime-pt-ubi9-python-3-12-pr-image-mirror 73c7cc1 link true /test rocm-runtime-pt-ubi9-python-3-12-pr-image-mirror
ci/prow/notebook-cuda-jupyter-ubi9-python-3-12-pr-image-mirror 73c7cc1 link true /test notebook-cuda-jupyter-ubi9-python-3-12-pr-image-mirror
ci/prow/notebook-jupyter-ubi9-python-3-12-pr-image-mirror 73c7cc1 link true /test notebook-jupyter-ubi9-python-3-12-pr-image-mirror
ci/prow/images 73c7cc1 link true /test images
ci/prow/notebooks-py312-ubi9-e2e-tests 73c7cc1 link true /test notebooks-py312-ubi9-e2e-tests
ci/prow/rocm-notebooks-py312-e2e-tests 73c7cc1 link true /test rocm-notebooks-py312-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ide-developer
Once this PR has been reviewed and has the lgtm label, please assign andyatmiami for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek jiridanek marked this pull request as draft November 11, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress lgtm review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/s tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants