Skip to content

[PATCH] Tensorflow Training 2.18 CVE Patch #5024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 45 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
0a72a4c
build tensorflow 2.18 training sm
Jyothirmaikottu Jul 16, 2025
e8616b9
build tensorflow 2.18 training ec2
Jyothirmaikottu Jul 16, 2025
be39c4d
build tensorflow 2.19 training sm
Jyothirmaikottu Jul 16, 2025
ec2b7b9
build tensorflow 2.19 training sm
Jyothirmaikottu Jul 16, 2025
245eb1b
build tensorflow 2.18 training sm
Jyothirmaikottu Jul 16, 2025
840d496
build tensorflow 2.18 training ec2
Jyothirmaikottu Jul 16, 2025
f587106
build tensorflow 2.18 with opencv pinned version ec2
Jyothirmaikottu Jul 16, 2025
14b3411
build tensorflow 2.18 with opencv pinned version sm
Jyothirmaikottu Jul 16, 2025
0e99dc4
build tensorflow 2.18 sm
Jyothirmaikottu Jul 16, 2025
f49b9e7
build tensorflow 2.18 ec2
Jyothirmaikottu Jul 16, 2025
cab6829
build tensorflow 2.19 sm
Jyothirmaikottu Jul 16, 2025
28412a7
build tensorflow 2.18 sm
Jyothirmaikottu Jul 16, 2025
181d9a3
build tensorflow 2.18 ec2
Jyothirmaikottu Jul 16, 2025
7786b20
build tensorflow 2.19 sm with open cv pinned
Jyothirmaikottu Jul 16, 2025
3c35d19
retry build for tensorflow 2.18 sm
Jyothirmaikottu Jul 17, 2025
52eccf4
retry build for tensorflow 2.18 ec2
Jyothirmaikottu Jul 17, 2025
2fd4deb
retry build for tensorflow 2.18 ec2
Jyothirmaikottu Jul 17, 2025
d7f2b3b
retry build for tensorflow 2.18 sm
Jyothirmaikottu Jul 17, 2025
e0d9609
retry build for tensorflow 2.18 sm
Jyothirmaikottu Jul 17, 2025
801a49f
retry build for tensorflow 2.18 ec2
Jyothirmaikottu Jul 17, 2025
8862f26
revert toml
Jyothirmaikottu Jul 18, 2025
97c25ef
build 2.18 cve sagemaker image with protobuf 6
Jyothirmaikottu Jul 21, 2025
415dd3c
build 2.18 cve sagemaker image with protobuf 6
Jyothirmaikottu Jul 21, 2025
7ff310d
build 2.18 cve sagemaker image with protobuf 6
Jyothirmaikottu Jul 21, 2025
31f3c43
build 2.18 cve ec2 image with protobuf 6
Jyothirmaikottu Jul 21, 2025
29d8347
Merge branch 'master' into patch-tf-cve
Jyothirmaikottu Jul 21, 2025
8e48893
build ec2
Jyothirmaikottu Jul 21, 2025
39eeedc
build ec2
Jyothirmaikottu Jul 21, 2025
d42494b
build ec2
Jyothirmaikottu Jul 21, 2025
394a6d6
run securtiy tests
Jyothirmaikottu Jul 22, 2025
94cb030
Merge branch 'master' into patch-tf-cve
Jyothirmaikottu Jul 22, 2025
32557eb
run securtiy tests
Jyothirmaikottu Jul 22, 2025
912f031
rerun security
Jyothirmaikottu Jul 22, 2025
5bb1f3a
build ec2
Jyothirmaikottu Jul 22, 2025
17c9c33
build sm
Jyothirmaikottu Jul 22, 2025
a71c0eb
build ec2
Jyothirmaikottu Jul 22, 2025
9340251
build ec2
Jyothirmaikottu Jul 22, 2025
1659e9f
build sm
Jyothirmaikottu Jul 22, 2025
1f3a66b
rebuild sm
Jyothirmaikottu Jul 22, 2025
445e016
rebuild sm
Jyothirmaikottu Jul 23, 2025
fc2a5f0
remove version pins
Jyothirmaikottu Jul 24, 2025
b146dcc
removed sm version pins
Jyothirmaikottu Jul 24, 2025
7c4b86f
rebuild sm
Jyothirmaikottu Jul 24, 2025
3f1cc52
sagemaker version pins
Jyothirmaikottu Jul 24, 2025
7ce61a4
tf build sm
Jyothirmaikottu Jul 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions dlc_developer_config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ deep_canary_mode = false
[build]
# Add in frameworks you would like to build. By default, builds are disabled unless you specify building an image.
# available frameworks - ["base", "vllm", "autogluon", "huggingface_tensorflow", "huggingface_pytorch", "huggingface_tensorflow_trcomp", "huggingface_pytorch_trcomp", "pytorch_trcomp", "tensorflow", "pytorch", "stabilityai_pytorch"]
build_frameworks = []
build_frameworks = ["tensorflow"]


# By default we build both training and inference containers. Set true/false values to determine which to build.
build_training = true
build_inference = true
build_inference = false

# Set do_build to "false" to skip builds and test the latest image built by this PR
# Note: at least one build is required to set do_build to "false"
Expand Down Expand Up @@ -120,7 +120,7 @@ use_scheduler = false

# Standard Framework Training
dlc-pr-pytorch-training = ""
dlc-pr-tensorflow-2-training = ""
dlc-pr-tensorflow-2-training = "tensorflow/training/buildspec-2-18-sm.yml"
dlc-pr-autogluon-training = ""

# ARM64 Training
Expand Down
2 changes: 1 addition & 1 deletion tensorflow/training/buildspec-2-18-ec2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ framework: &FRAMEWORK tensorflow
version: &VERSION 2.18.0
short_version: &SHORT_VERSION "2.18"
arch_type: x86
autopatch_build: "True"
# autopatch_build: "True"

repository_info:
training_repository: &TRAINING_REPOSITORY
Expand Down
2 changes: 1 addition & 1 deletion tensorflow/training/buildspec-2-18-sm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ framework: &FRAMEWORK tensorflow
version: &VERSION 2.18.0
short_version: &SHORT_VERSION "2.18"
arch_type: x86
autopatch_build: "True"
# autopatch_build: "True"

repository_info:
training_repository: &TRAINING_REPOSITORY
Expand Down
11 changes: 6 additions & 5 deletions tensorflow/training/docker/2.18/py3/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -272,15 +272,16 @@ RUN $PYTHON -m pip install --no-cache-dir -U \
seaborn \
shap


RUN $PYTHON -m pip install --no-cache-dir -U \
"sagemaker<3" \
sagemaker-experiments==0.* \
sagemaker-tensorflow-training \
sagemaker-training \
"sagemaker>=2.172.0,<3" \
"sagemaker-experiments==0.1.7" \
"sagemaker-tensorflow-training==20.5.0" \
"sagemaker-training>=4.3.0,<=4.7.4" \
"sagemaker-studio-analytics-extension<1" \
"sparkmagic<1" \
"sagemaker-studio-sparkmagic-lib<1" \
smclarify
"smclarify"

# Remove python kernel installed by sparkmagic
RUN /usr/local/bin/jupyter-kernelspec remove -f python3
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"77740": "[Package: protobuf] Affected versions of this package are vulnerable to a potential Denial of Service (DoS) attack due to unbounded recursion when parsing untrusted Protocol Buffers data. The pure-Python implementation fails to enforce recursion depth limits when processing recursive groups, recursive messages, or a series of SGROUP tags, leading to stack overflow conditions that can crash the application by exceeding Python's recursion limit."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"77740": "[Package: protobuf] Affected versions of this package are vulnerable to a potential Denial of Service (DoS) attack due to unbounded recursion when parsing untrusted Protocol Buffers data. The pure-Python implementation fails to enforce recursion depth limits when processing recursive groups, recursive messages, or a series of SGROUP tags, leading to stack overflow conditions that can crash the application by exceeding Python's recursion limit."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"77740": "[Package: protobuf] Affected versions of this package are vulnerable to a potential Denial of Service (DoS) attack due to unbounded recursion when parsing untrusted Protocol Buffers data. The pure-Python implementation fails to enforce recursion depth limits when processing recursive groups, recursive messages, or a series of SGROUP tags, leading to stack overflow conditions that can crash the application by exceeding Python's recursion limit."
}
24 changes: 19 additions & 5 deletions tensorflow/training/docker/2.18/py3/cu125/Dockerfile.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,20 @@ RUN mkdir /tmp/efa-ofi-nccl \
&& make install \
&& rm -rf /tmp/efa-ofi-nccl

# patch nvjpeg
RUN mkdir -p /tmp/nvjpeg \
&& cd /tmp/nvjpeg \
&& wget https://developer.download.nvidia.com/compute/cuda/redist/libnvjpeg/linux-x86_64/libnvjpeg-linux-x86_64-12.4.0.76-archive.tar.xz \
&& tar -xvf libnvjpeg-linux-x86_64-12.4.0.76-archive.tar.xz \
&& rm -rf /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg* \
&& rm -rf /usr/local/cuda/targets/x86_64-linux/include/nvjpeg.h \
&& cp libnvjpeg-linux-x86_64-12.4.0.76-archive/lib/libnvjpeg* /usr/local/cuda/targets/x86_64-linux/lib/ \
&& cp libnvjpeg-linux-x86_64-12.4.0.76-archive/include/* /usr/local/cuda/targets/x86_64-linux/include/ \
&& rm -rf /tmp/nvjpeg \
# patch cuobjdump and nvdisasm
&& rm -rf /usr/local/cuda/bin/cuobjdump* \
&& rm -rf /usr/local/cuda/bin/nvdisasm*

# Allow OpenSSH to talk to containers without asking for confirmation
RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new \
&& echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new \
Expand Down Expand Up @@ -363,14 +377,14 @@ RUN $PYTHON -m pip install --no-cache-dir -U \
shap

RUN $PYTHON -m pip install --no-cache-dir -U \
"sagemaker<3" \
sagemaker-experiments==0.* \
sagemaker-tensorflow-training \
sagemaker-training \
"sagemaker>=2.172.0,<3" \
"sagemaker-experiments==0.1.7" \
"sagemaker-tensorflow-training==20.5.0" \
"sagemaker-training>=4.3.0,<=4.7.4" \
"sagemaker-studio-analytics-extension<1" \
"sparkmagic<1" \
"sagemaker-studio-sparkmagic-lib<1" \
smclarify
"smclarify"

# install boost
# tensorflow is compiled with --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"77740": "[Package: protobuf] Affected versions of this package are vulnerable to a potential Denial of Service (DoS) attack due to unbounded recursion when parsing untrusted Protocol Buffers data. The pure-Python implementation fails to enforce recursion depth limits when processing recursive groups, recursive messages, or a series of SGROUP tags, leading to stack overflow conditions that can crash the application by exceeding Python's recursion limit."
}