Skip to content
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
550cc8c
added bert as a test inference benchmark
McLavish Oct 30, 2025
f9c3817
hotfix to enable gpu capabilities
McLavish Nov 3, 2025
0f93b66
added pre-commit hooks for linting and formatting
McLavish Nov 4, 2025
b965d7b
linting and formatting setting for whoever uses vscode + black + flak…
McLavish Nov 4, 2025
e9916db
reformatted local file so it passes linting/format
McLavish Nov 4, 2025
2b75311
Merge branch 'development' into feature/bert-inference
McLavish Nov 4, 2025
813af03
bert now uses gpu
McLavish Nov 4, 2025
3a96f04
changed data repo to be OUR forked data repo
McLavish Nov 4, 2025
d4d5d30
change data loading path to own forked repo
Russellpang Nov 5, 2025
1b7deb7
change data loading path to own forked repo
Russellpang Nov 5, 2025
668652c
update benchmark function
Russellpang Nov 5, 2025
aae1023
fix: replaced onnxruntime requirement from CPU to GPU. now it actuall…
McLavish Nov 5, 2025
25fd1d9
circleci mypy fix?
McLavish Nov 5, 2025
c478c91
Merge pull request #2 from McLavish/feature/bert-inference
McLavish Nov 5, 2025
d6c4227
benchmarks is now flake8/black compliant. pre-commit hooks also check…
McLavish Nov 5, 2025
27b14d6
add linalg benchmarks
Russellpang Nov 5, 2025
ace2335
add linalg benchmarks
Russellpang Nov 5, 2025
ad3023d
changed CI/CD to run linting on the benchmarks folder ONLY. disabled …
McLavish Nov 5, 2025
52f30c0
fix typo
Russellpang Nov 6, 2025
4efff4d
update code
Russellpang Nov 6, 2025
f8577e7
Create .gitkeep
JessieeeNotLi Nov 6, 2025
dfaa14a
watermarking GPU benchmark files
JessieeeNotLi Nov 6, 2025
7f4f6c9
added run benchmark script
JessieeeNotLi Nov 6, 2025
1653b7c
Add usage instructions to read.me
JessieeeNotLi Nov 6, 2025
f5e7ab7
watermarking_readme.md
JessieeeNotLi Nov 6, 2025
aa3483f
Update NVENC benchmark instructions in README
JessieeeNotLi Nov 6, 2025
adf54a5
migrated from CircleCI to Github Actions
McLavish Nov 6, 2025
67772e2
fixed workflow directory
McLavish Nov 6, 2025
8f02b66
pip dependencies take too long
McLavish Nov 6, 2025
ae61e4b
Merge pull request #8 from McLavish/hotfix/code-quality-on-benchmarks
McLavish Nov 6, 2025
e06985c
new benchmark data
McLavish Nov 10, 2025
037f6c3
Bring folder from other-branch
Nov 12, 2025
377d949
update code
Nov 12, 2025
8dd8a6e
modify code and requirements
Nov 12, 2025
fa7e76e
Create .gitkeep
JessieeeNotLi Nov 12, 2025
51713f4
Add Dockerfile for NVENC-enabled FFmpeg image
JessieeeNotLi Nov 12, 2025
e0cfbdc
Add run script for video watermarking benchmark
JessieeeNotLi Nov 12, 2025
de15075
unfinished new fuc
Russellpang Nov 12, 2025
f534a53
Add files via upload
JessieeeNotLi Nov 12, 2025
3006879
add new functions
Russellpang Nov 12, 2025
7057465
Update run.sh
JessieeeNotLi Nov 12, 2025
44c8bcb
Update run.sh
JessieeeNotLi Nov 12, 2025
e53cfde
added gpu benchmark, and test results on CPU:wq
Nov 12, 2025
d224ddc
add new functions
Russellpang Nov 13, 2025
921f321
added recommender benchmark
McLavish Nov 13, 2025
dd840d1
Merge branch 'development' into feature/russell
YuxuanLiu-kayla Nov 13, 2025
4fca4aa
changed data submodule to use ssh and not https
McLavish Nov 13, 2025
26dfcf4
add channel_flow, compute, fft, and resnet of jax_npbench
down-street Nov 15, 2025
fad77da
reset the config
down-street Nov 15, 2025
e995e6a
Merge pull request #13 from McLavish/jiahao/npbenchs
down-street Nov 15, 2025
7e0d13f
microbenchmark example
Nov 16, 2025
942f5a1
Remove SSH public key from eval command
Russellpang Nov 16, 2025
6bc1dd7
Remove local_deployment.json configuration
Russellpang Nov 16, 2025
460ea1f
Delete out_storage.json configuration file
Russellpang Nov 16, 2025
de41ab6
Remove SSH private key from eval command
Russellpang Nov 16, 2025
ded520f
remove garbage
Nov 16, 2025
5c85980
test
Russellpang Nov 17, 2025
c5782dd
test
Russellpang Nov 17, 2025
2b52ced
test
Russellpang Nov 17, 2025
e5cb20c
Merge branch 'development' into feature/russell
Russellpang Nov 17, 2025
6488d6d
remove unnecessay files
Russellpang Nov 17, 2025
55c4ac4
fuck you
Russellpang Nov 17, 2025
b97b7a5
Refactor argument parsing for cleaner syntax
Russellpang Nov 17, 2025
1998b6b
Change 'reps' to 'iters' in jacobi2d function
Russellpang Nov 17, 2025
2cbd768
Delete benchmarks/000.microbenchmarks/050.matmul directory
Russellpang Nov 17, 2025
074d4b7
Merge pull request #6 from McLavish/feature/russell
McLavish Nov 17, 2025
efced9c
Revert "changed data submodule to use ssh and not https"
McLavish Nov 17, 2025
bc48b5e
fix: missing config.json
McLavish Nov 17, 2025
e154ba0
Merge branch 'development' into feature/inference-recommender
McLavish Nov 17, 2025
d9ed506
Merge pull request #11 from McLavish/feature/inference-recommender
McLavish Nov 17, 2025
28a3790
add vector-add & host-device-copy microbenchmark written in pytorch
down-street Nov 18, 2025
ec0db71
Update matrix multiplication microbenchmark (sizes & reps)
YuxuanLiu-kayla Nov 20, 2025
16c943a
peak performance microbenchmarks
Nov 22, 2025
bff660d
reformat
Nov 22, 2025
703a05d
adapted watermarking benchmark to sebs structure
JessieeeNotLi Nov 23, 2025
f5093a8
Merge branch 'development' into jessie/video_watermarking
JessieeeNotLi Nov 23, 2025
0bf9990
Merge pull request #26 from McLavish/feature/matmul_microbenchmark/yu…
YuxuanLiu-kayla Nov 25, 2025
6b00cdf
fix bugs that python 3.8 conflict with numpy version
down-street Nov 25, 2025
f6aa71b
Merge branch 'development' into feature/microbenchmark/jiahao
down-street Nov 25, 2025
327eb44
Merge pull request #20 from McLavish/feature/microbenchmark/jiahao
down-street Nov 25, 2025
284c6c8
Merge remote-tracking branch 'origin/development' into jessie/video_w…
JessieeeNotLi Dec 15, 2025
5766b50
Remove unused subprocess import
JessieeeNotLi Dec 15, 2025
67b988b
Update black to support Python 3.12
JessieeeNotLi Dec 15, 2025
83c5309
Update black in requirements.txt for Python 3.12
JessieeeNotLi Dec 15, 2025
70a7b36
Apply black formatting
JessieeeNotLi Dec 15, 2025
d808208
Apply black formatting across benchmarks and sebs
JessieeeNotLi Dec 15, 2025
791c881
Format sebs and benchmarks with black
JessieeeNotLi Dec 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"timeout": 60,
"memory": 512,
"languages": ["python"],
"modules": ["storage"]
}
10 changes: 10 additions & 0 deletions benchmarks/200.multimedia/220.video-processing-gpu/init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# Arguments required by SeBS, even if unused
DIR=$1
VERBOSE=$2
TARGET_ARCHITECTURE=$3

# This benchmark does not need any special init step.
# All dependencies (torch, opencv-python, etc.) are installed via requirements.txt.
exit 0
28 changes: 28 additions & 0 deletions benchmarks/200.multimedia/220.video-processing-gpu/input.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import glob, os

def buckets_count():
return (1, 1)

'''
Generate test, small and large workload for thumbnailer.

:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
for file in glob.glob(os.path.join(data_dir, '*.mp4')):
img = os.path.relpath(file, data_dir)
upload_func(0, img, file)
#TODO: multiple datasets
input_config = {'object': {}, 'bucket': {}}
input_config['object']['key'] = img
# The function supports only "gpu-filter" at the moment.
input_config['object']['op'] = 'gpu-filter'
input_config['object']['duration'] = 1
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]
return input_config
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Add error handling for missing files and improve variable naming.

This function has several issues:

  1. NameError if no .mp4 files exist: If the glob finds no files, the loop never executes and img is never defined, causing a NameError on line 21.
  2. Misleading variable name: img suggests an image file, but this processes video files.
  3. Missing parameter: The num_iters field present in the similar function at benchmarks/200.multimedia/220.video-processing/input.py (line 37) is missing here.

Apply this diff to fix these issues (matching the implementation in 220.video-processing/input.py):

 def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
+    last_key = None
+
+    # Upload all .mp4 files from data_dir to bucket 0
     for file in glob.glob(os.path.join(data_dir, '*.mp4')):
-        img = os.path.relpath(file, data_dir)
-        upload_func(0, img, file)
+        key = os.path.relpath(file, data_dir)
+        upload_func(0, key, file)
+        last_key = key
+
+    if last_key is None:
+        raise RuntimeError(f"No .mp4 files found in {data_dir}")
+
     #TODO: multiple datasets
     input_config = {'object': {}, 'bucket': {}}
-    input_config['object']['key'] = img
+    # Use the last uploaded file as the input object
+    input_config['object']['key'] = last_key
     # The function supports only "gpu-filter" at the moment.
-    input_config['object']['op'] = 'gpu-filter'
-    input_config['object']['duration'] = 1
+    input_config['object']['op'] = 'gpu-filter'     # must match your handler's operations dict
+    input_config['object']['duration'] = 1          # seconds of video to process
+    input_config['object']['num_iters'] = 10        # extra param for GPU workload intensity
+
     input_config['bucket']['bucket'] = benchmarks_bucket
🧰 Tools
🪛 Ruff (0.14.5)

15-15: Unused function argument: size

(ARG001)


15-15: Unused function argument: nosql_func

(ARG001)

🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/input.py around lines 15
to 28, handle the case where no .mp4 files are found (avoid NameError) by
checking glob results and raising/logging an error or returning None, rename the
variable from img to a clearer video_relpath or video_key, ensure you call
upload_func for the selected video(s) with the correct args, and add the missing
'num_iters' field (set to 1 like the non-GPU implementation) into input_config;
mirror the control flow and field names used in
benchmarks/200.multimedia/220.video-processing/input.py so input_config always
has object.key, object.op ('gpu-filter'), object.duration, object.num_iters, and
bucket.* set, and fail fast if no video files are available.

197 changes: 197 additions & 0 deletions benchmarks/200.multimedia/220.video-processing-gpu/python/function.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
#!/usr/bin/env python

import datetime
import os
import subprocess # not strictly needed anymore, but you can remove it
import stat # can also be removed if you drop ffmpeg entirely
from typing import Dict, Any

import numpy as np
import cv2
import torch
import torch.nn as nn

from . import storage

client = storage.storage.get_instance()
SCRIPT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__)))


def gpu_video_filter(video_path: str, duration: float, event: Dict[str, Any]) -> str:
"""
Decode a video on CPU (OpenCV), run a heavy GPU filter with PyTorch,
and re-encode the processed video.

This gives you a realistic FaaS workload:
- I/O via storage
- CPU video decode/encode
- GPU-heavy tensor processing
"""

cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
raise RuntimeError(f"Could not open input video: {video_path}")

fps = cap.get(cv2.CAP_PROP_FPS)
if not fps or fps <= 0:
fps = 25.0 # fallback

max_frames = int(fps * duration)
frames = []
Comment on lines +34 to +39
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against NaN/invalid FPS values before computing max_frames

If cap.get(cv2.CAP_PROP_FPS) returns NaN (which some containers/drivers do), the condition if not fps or fps <= 0 will not trigger, and int(fps * duration) will raise at runtime.

You can defensively handle this:

+import math
 ...
-    fps = cap.get(cv2.CAP_PROP_FPS)
-    if not fps or fps <= 0:
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    if not fps or fps <= 0 or math.isnan(fps):
         fps = 25.0  # fallback

This keeps the simple fallback while avoiding a crash on malformed metadata.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fps = cap.get(cv2.CAP_PROP_FPS)
if not fps or fps <= 0:
fps = 25.0 # fallback
max_frames = int(fps * duration)
frames = []
fps = cap.get(cv2.CAP_PROP_FPS)
if not fps or fps <= 0 or math.isnan(fps):
fps = 25.0 # fallback
max_frames = int(fps * duration)
frames = []
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/function.py around
lines 35 to 40, fps obtained from cap.get(cv2.CAP_PROP_FPS) can be NaN or
non-finite so the current check (if not fps or fps <= 0) can miss it and cause
int(fps * duration) to crash; defend by validating fps with math.isfinite (or
math.isnan) and the existing <= 0 check, e.g. import math and set fps = fallback
when not math.isfinite(fps) or fps <= 0, then compute max_frames from the
validated fps.


for i in range(max_frames):
ret, frame_bgr = cap.read()
if not ret:
break
# Convert BGR (OpenCV default) to RGB
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
frames.append(frame_rgb)

cap.release()

if not frames:
raise RuntimeError("No frames decoded from video (empty or too short?)")

# Stack into (T, H, W, C)
video_np = np.stack(frames, axis=0) # uint8, 0–255
T, H, W, C = video_np.shape

# Convert to torch tensor: (T, C, H, W), float32 in [0, 1]
video = torch.from_numpy(video_np).permute(0, 3, 1, 2).float() / 255.0

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
video = video.to(device)

# Simple heavy-ish GPU workload: repeated 3x3 conv + ReLU
# You can tweak num_channels, num_iters, etc. via the event
num_iters = event.get("object", {}).get("num_iters", 10)
num_channels = 3 # keep 3 so we can write back as RGB

conv = nn.Conv2d(
in_channels=num_channels,
out_channels=num_channels,
kernel_size=3,
padding=1,
bias=False,
).to(device)

with torch.no_grad():
for _ in range(num_iters):
video = torch.relu(conv(video))

# Back to uint8 on CPU: (T, H, W, C)
video = (video.clamp(0.0, 1.0) * 255.0).byte()
video_np_out = video.permute(0, 2, 3, 1).cpu().numpy()

# Encode processed video with OpenCV (CPU)
base = os.path.splitext(os.path.basename(video_path))[0]
out_path = f"/tmp/processed-{base}.mp4"

fourcc = cv2.VideoWriter_fourcc(*"mp4v")
Comment on lines +85 to +89
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden /tmp usage and sanitize user-controlled key

Two related issues here:

  1. download_path = f"/tmp/{key}":
    • If key contains path separators or .., this can escape /tmp or attempt to create nested directories that don’t exist, leading to failures or even overwriting unintended files.
  2. Static analysis (S108) is also concerned about the predictable paths /tmp/processed-{base}.mp4 and /tmp/{key}.

A minimal hardening that preserves current naming:

-    base = os.path.splitext(os.path.basename(video_path))[0]
-    out_path = f"/tmp/processed-{base}.mp4"
+    base = os.path.splitext(os.path.basename(video_path))[0]
+    out_path = os.path.join("/tmp", f"processed-{base}.mp4")
 ...
-    download_path = f"/tmp/{key}"
+    # Restrict local path to a basename to avoid directory traversal and
+    # missing parent directories under /tmp.
+    local_name = os.path.basename(key)
+    download_path = os.path.join("/tmp", local_name)

If you want to fully satisfy stricter security tooling, consider using tempfile.mkstemp / NamedTemporaryFile for one or both of these paths, but the basename sanitization above already removes the traversal risk while staying simple.

Also applies to: 150-177

🧰 Tools
🪛 Ruff (0.14.5)

88-88: Probable insecure usage of temporary file or directory: "/tmp/processed-"

(S108)

🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/function.py around
lines 86 to 90 (and similarly lines 150-177), the code constructs predictable
/tmp paths using user-controlled values; sanitize the inputs and avoid path
traversal by taking only the basename of user-controlled key (e.g., key_base =
os.path.basename(key)) and validating/stripping any remaining suspicious
segments like '..' or os.sep, then build paths with os.path.join("/tmp",
key_base); for the processed output prefer creating a temporary file via
tempfile.NamedTemporaryFile or tempfile.mkstemp (or at minimum sanitize base =
os.path.basename(os.path.splitext(os.path.basename(video_path))[0]) and join
with /tmp) to eliminate predictable filenames and ensure the directory is
correct and exists before writing.

writer = cv2.VideoWriter(out_path, fourcc, fps, (W, H))
if not writer.isOpened():
raise RuntimeError(f"Could not open VideoWriter for: {out_path}")

for frame_rgb in video_np_out:
frame_bgr = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)
writer.write(frame_bgr)

writer.release()
return out_path
Comment on lines +85 to +99
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Insecure /tmp path construction remains unresolved.

As flagged in the previous review, constructing predictable /tmp paths exposes the code to security issues. The current implementation doesn't address the S108 static analysis warning or the security concerns raised earlier.

At minimum, use os.path.join for path construction:

     base = os.path.splitext(os.path.basename(video_path))[0]
-    out_path = f"/tmp/processed-{base}.mp4"
+    out_path = os.path.join("/tmp", f"processed-{base}.mp4")

For better security, consider using tempfile.NamedTemporaryFile:

import tempfile
...
with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False, dir="/tmp") as tmp:
    out_path = tmp.name
🧰 Tools
🪛 Ruff (0.14.8)

87-87: Probable insecure usage of temporary file or directory: "/tmp/processed-"

(S108)


92-92: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/function.py around
lines 85 to 99, the code creates a predictable /tmp filename which is a security
risk; replace the fixed path with a securely created temporary file (e.g., use
tempfile.NamedTemporaryFile or tempfile.mkstemp) to obtain a unique,
non-guessable out_path with the .mp4 suffix, pass dir="/tmp" only if required,
set delete=False so OpenCV can open/write to it, close the temp handle before
returning, and remove the manual string concatenation (os.path.join is
acceptable for any explicit directory joins).



# You can still support multiple ops if you want in the future.
# For now, we map "gpu-filter" (or "transcode" if you want to reuse the old name)
operations = {
"gpu-filter": gpu_video_filter,
# If you want to keep old names:
# "transcode": gpu_video_filter,
# "watermark": gpu_video_filter,
# "extract-gif": gpu_video_filter,
}


def handler(event: Dict[str, Any]):
"""
FaaS entrypoint.

Expected event structure (SeBS-style):

{
"bucket": {
"bucket": "<bucket-name>",
"input": "<input-prefix>",
"output": "<output-prefix>"
},
"object": {
"key": "<object-key>",
"duration": <seconds>,
"op": "gpu-filter",
// optional:
// "num_iters": 20
}
}
"""

bucket = event.get("bucket", {}).get("bucket")
input_prefix = event.get("bucket", {}).get("input")
output_prefix = event.get("bucket", {}).get("output")

obj = event.get("object", {})
key = obj.get("key")
duration = obj.get("duration", 5) # default: 5 seconds
op = obj.get("op", "gpu-filter")

if op not in operations:
raise ValueError(
f"Unknown operation '{op}'. Supported: {', '.join(operations.keys())}"
)

download_path = f"/tmp/{key}"

# If you no longer ship ffmpeg/ffmpeg, you can remove this chmod block completely.
# Leaving it here is harmless if the file doesn't exist (it will just fail and pass).
ffmpeg_binary = os.path.join(SCRIPT_DIR, "ffmpeg", "ffmpeg")
try:
st = os.stat(ffmpeg_binary)
os.chmod(ffmpeg_binary, st.st_mode | stat.S_IEXEC)
except OSError:
# Ignore if ffmpeg is not present or filesystem is read-only.
pass

# --- Download phase ---
download_begin = datetime.datetime.now()
client.download(bucket, os.path.join(input_prefix, key), download_path)
download_size = os.path.getsize(download_path)
download_stop = datetime.datetime.now()

# --- Compute phase (GPU via PyTorch) ---
process_begin = datetime.datetime.now()
upload_path = operations[op](download_path, duration, event)
process_end = datetime.datetime.now()

# --- Upload phase ---
upload_begin = datetime.datetime.now()
filename = os.path.basename(upload_path)
upload_size = os.path.getsize(upload_path)
upload_key = client.upload(bucket, os.path.join(output_prefix, filename), upload_path)
upload_stop = datetime.datetime.now()

# Convert timedeltas to microseconds
download_time = (download_stop - download_begin) / datetime.timedelta(microseconds=1)
upload_time = (upload_stop - upload_begin) / datetime.timedelta(microseconds=1)
process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1)

return {
"result": {
"bucket": bucket,
"key": upload_key,
},
"measurement": {
"download_time": download_time,
"download_size": download_size,
"upload_time": upload_time,
"upload_size": upload_size,
"compute_time": process_time,
},
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
torch
opencv-python-headless
numpy
Comment on lines +1 to +3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Pin dependency versions for reproducibility and security.

The requirements file lacks version constraints, which poses reproducibility and security risks. Unpinned versions can produce non-deterministic benchmark results and may introduce breaking changes across CI/CD runs.

Additionally, torch is specified without explicit GPU support configuration—confirm whether CUDA/cuDNN dependencies are documented elsewhere in the benchmark setup, or consider pinning a CUDA-specific torch variant if GPU acceleration is required.

Consider applying version pins like this:

-torch
-opencv-python-headless
-numpy
+torch==2.0.1
+opencv-python-headless==4.8.1.78
+numpy==1.24.3

Once you've determined the appropriate versions, please verify:

  1. The versions are compatible with your target GPU/CUDA environment.
  2. Any external CUDA/cuDNN dependencies are documented in a setup guide for the benchmark.
  3. Run a test build to confirm all dependencies resolve without conflicts.





---

🏁 Script executed:

```shell
#!/bin/bash
# Search for all requirements files to understand version pinning patterns in codebase
echo "=== Finding all requirements.txt files ==="
find . -name "requirements*.txt" -type f | head -20

echo -e "\n=== Checking versions in other requirements files ==="
find . -name "requirements*.txt" -type f -exec sh -c 'echo "File: $1"; head -10 "$1"' _ {} \; | head -60

Length of output: 2594


🏁 Script executed:

#!/bin/bash
# Search for GPU, CUDA, cuDNN references in benchmark directory and related docs
echo "=== Searching for GPU/CUDA references in benchmark context ==="
rg -i "cuda|gpu|cudnn" benchmarks/200.multimedia/220.video-processing-gpu/ --type-list | head -5
rg -i "cuda|gpu|cudnn" benchmarks/200.multimedia/220.video-processing-gpu/ -A2 -B2 2>/dev/null | head -40

echo -e "\n=== Looking for setup documentation ==="
find benchmarks/200.multimedia/220.video-processing-gpu/ -type f \( -name "*.md" -o -name "*.txt" -o -name "setup*" -o -name "README*" \) 

Length of output: 4364


🏁 Script executed:

#!/bin/bash
# Check structure of the benchmark directory and look for related docs
echo "=== Contents of video-processing-gpu benchmark directory ==="
ls -la benchmarks/200.multimedia/220.video-processing-gpu/ 2>/dev/null || echo "Directory check"
find benchmarks/200.multimedia/220.video-processing-gpu/ -type f | head -20

echo -e "\n=== Parent directory context ==="
ls -la benchmarks/200.multimedia/

Length of output: 1214


🏁 Script executed:

#!/bin/bash
# Search for any documentation about dependencies or torch in the benchmark context
echo "=== Searching for torch, dependencies, or installation docs ==="
rg -i "torch|dependencies|install|setup" benchmarks/200.multimedia/ --type md --type txt -A2 | head -50

echo -e "\n=== Checking if there's a main setup guide ==="
find benchmarks/ -maxdepth 2 -name "README*" -o -name "setup*" -o -name "INSTALL*" | head -10
cat benchmarks/README.md 2>/dev/null | head -30

Length of output: 371


🌐 Web query:

torch 2.0.1 opencv-python-headless 4.8.1 numpy 1.24 compatibility

💡 Result:

Short answer: Yes — torch 2.0.1 works with NumPy 1.24, and opencv-python-headless 4.8.1 also supports NumPy 1.24. The main incompatibility to avoid is NumPy 2.x (PyTorch wheels built against NumPy 1.x can fail under NumPy 2.x). Test by importing torch and cv2 after installing.

Recommended pip pins:

  • pip install "numpy==1.24.*"
  • pip install "torch==2.0.1"
  • pip install "opencv-python-headless==4.8.1"

References:

  • PyTorch issues describing failures when NumPy 2.x is used with PyTorch built against NumPy 1.x. [1][2]
  • opencv-python-headless PyPI (wheels follow NumPy/Python compatibility). [3]

Sources:
[1] PyTorch issue: Numpy compatibility / DataLoader RuntimeError.
[2] PyTorch issue: modules compiled with NumPy 1.x incompatible with NumPy 2.x.
[3] opencv-python-headless PyPI project page.


🌐 Web query:

torch GPU setup requirements CUDA environment variables

💡 Result:

Summary — what you need for PyTorch GPU + relevant CUDA environment variables

  1. Requirements (high level)
  • Install a PyTorch binary built for the CUDA/ROCm version you intend to use, or build PyTorch from source against your local CUDA. The pip/conda wheel’s CUDA version must match the binary you select (or you must build). Verify with torch.cuda.is_available() after install. [1][3]

  • NVIDIA GPU + matching NVIDIA driver installed on host (driver must support the CUDA runtime version used by the PyTorch binary). The driver + GPU must be functional (check nvidia-smi). [3]

  • If building extensions or PyTorch from source you need a local CUDA toolkit (nvcc), cuDNN, NVML, etc., and to point environment variables to the toolkit and libs. [2][4]

  1. Important environment variables you’ll commonly set/check
  • CUDA_HOME (or CUDA_PATH on Windows): root of CUDA installation (e.g. /usr/local/cuda or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1). Used by build scripts and some tools. [2][4]
  • CUDACXX: path to nvcc (e.g. ${CUDA_HOME}/bin/nvcc) when building from source. [4]
  • PATH: include ${CUDA_HOME}/bin so nvcc and tools are found. [2][4]
  • LD_LIBRARY_PATH (Linux) / PATH (Windows) / DYLD_LIBRARY_PATH (macOS older): include CUDA lib directories (e.g. ${CUDA_HOME}/lib64) so shared libs (libcudart, cuDNN) are found at runtime. [2][4]
  • CUDA_VISIBLE_DEVICES: control which GPUs are visible to CUDA (comma-separated indices, or -1 to hide all). Useful for multi-GPU hosts. [2]
  • CUDA_LAUNCH_BLOCKING=1: make CUDA calls synchronous (debugging). [2]
  • PYTORCH_CUDA_ALLOC_CONF: control PyTorch CUDA allocator behavior (memory fragmentation, max split size, etc.). [2]
  • PYTORCH_NO_CUDA_MEMORY_CACHING: disable PyTorch caching allocator (useful for debugging). [2]
  • TORCH_CUDNN_V8_API_DISABLED / TORCH_CUDNN_V8_API_LRU_CACHE_LIMIT etc.: cuDNN-specific controls. [2]
  • NVML-related (NVML_LIB_PATH) when building packages that query NVML. [4]
  1. Typical quick checks & steps (Linux)
  • Ensure driver/GPU visible: nvidia-smi
  • Ensure CUDA bin/libs on PATH/LD_LIBRARY_PATH:
    export CUDA_HOME=/usr/local/cuda
    export CUDACXX="${CUDA_HOME}/bin/nvcc"
    export PATH="${CUDA_HOME}/bin:${PATH}"
    export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
    source ~/.bashrc
  • Install PyTorch GPU wheel matching desired CUDA version (or conda package with cudatoolkit). Then in Python:
    import torch; print(torch.cuda.is_available(), torch.version.cuda, torch.cuda.device_count())
  1. Notes / gotchas
  • Prebuilt PyTorch wheels include CUDA runtimes for the CUDA versions they were built for; setting CUDA_HOME / local CUDA does not make a CPU-only wheel suddenly use your system CUDA — you must install a GPU-enabled PyTorch build or build from source. [3]
  • Mismatch between PyTorch’s bundled CUDA version and your GPU driver can cause torch.cuda.is_available() to be False; updating the NVIDIA driver or installing a matching PyTorch binary usually resolves this. [3]

Sources

  • PyTorch CUDA environment variables and PyTorch-specific vars. [1]
  • PyTorch Get Started / installation and verification notes (matching binaries, building from source). [3]
  • Example CUDA env var usage and build-time vars (CUDA_HOME, CUDACXX, LD_LIBRARY_PATH, NVML paths). [4]

References
[1] PyTorch — CUDA Environment Variables.
[2] PyTorch/XLA GPU environment setup (examples of PATH/LD_LIBRARY_PATH).
[3] PyTorch Get Started / Installation & verification.

(If you want, tell me your OS, PyTorch version and how you installed it and I’ll give the exact install / env var commands and a short checklist to get torch.cuda.is_available() → True.)


Pin dependency versions to prevent NumPy 2.x incompatibility and ensure reproducibility.

Version pinning is missing, which creates reproducibility risks. Most critically, NumPy 2.x is incompatible with PyTorch wheels built against NumPy 1.x, leading to runtime failures if versions drift across CI/CD runs.

Regarding GPU support: the code correctly uses torch.cuda.is_available() to detect and enable GPU at runtime, which is the proper pattern. GPU availability depends on the environment (CUDA toolkit, driver, PyTorch build selection) rather than the requirements.txt file itself. The standard torch package handles both CPU and GPU cases gracefully.

Consider pinning versions with NumPy <2.0 constraint:

-torch
-opencv-python-headless
-numpy
+torch==2.0.1
+opencv-python-headless==4.8.1.78
+numpy<2.0

Verify compatibility with your target environment after applying pins.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
torch
opencv-python-headless
numpy
torch==2.0.1
opencv-python-headless==4.8.1.78
numpy<2.0
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing-gpu/python/requirements.txt
lines 1-3, dependency versions are not pinned which risks pulling NumPy 2.x
(incompatible with many PyTorch wheels) and breaks reproducibility; update the
file to pin specific, tested versions (at minimum constrain NumPy to <2.0), pin
opencv-python-headless to a known working release, and pin torch to the version
you intend to run (or document the exact wheel URL/extra-index for CUDA-enabled
builds); after changing, run CI or a local environment install to verify
compatibility with your target CUDA/toolchain and adjust pins accordingly.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 28 additions & 12 deletions benchmarks/200.multimedia/220.video-processing/input.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,43 @@
import glob, os
import glob
import os

def buckets_count():
# one input bucket, one output bucket
return (1, 1)

'''
Generate test, small and large workload for thumbnailer.
Generate test, small and large workload for the GPU video filter benchmark.

:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
:param size: workload size (e.g., "test", "small", "large")
:param benchmarks_bucket: name of the benchmark bucket
:param input_paths: list of input prefixes (one per input bucket)
:param output_paths: list of output prefixes (one per output bucket)
:param upload_func: upload function taking three params (bucket_idx, key, filepath)
:param nosql_func: not used here
'''
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
last_key = None

# Upload all .mp4 files from data_dir to bucket 0
for file in glob.glob(os.path.join(data_dir, '*.mp4')):
img = os.path.relpath(file, data_dir)
upload_func(0, img, file)
#TODO: multiple datasets
key = os.path.relpath(file, data_dir)
upload_func(0, key, file)
last_key = key

if last_key is None:
raise RuntimeError(f"No .mp4 files found in {data_dir}")

input_config = {'object': {}, 'bucket': {}}
input_config['object']['key'] = img
input_config['object']['op'] = 'watermark'
input_config['object']['duration'] = 1

# Use the last uploaded file as the input object
input_config['object']['key'] = last_key
input_config['object']['op'] = 'gpu-filter' # must match your handler's operations dict
input_config['object']['duration'] = 1 # seconds of video to process
input_config['object']['num_iters'] = 10 # extra param for GPU workload intensity

input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]

return input_config
Loading