Skip to content

Test/gpu nvidia#142

Open
lgldsilva wants to merge 11 commits intotsaridas:mainfrom
lgldsilva:test/gpu-nvidia
Open

Test/gpu nvidia#142
lgldsilva wants to merge 11 commits intotsaridas:mainfrom
lgldsilva:test/gpu-nvidia

Conversation

@lgldsilva
Copy link

This pull request introduces a new NVIDIA GPU-accelerated Docker build and updates the deployment configuration to enable hardware-accelerated video transcoding with ffmpeg NVENC support for Stremio. It includes a new Dockerfile.nvidia, updates to both Docker Compose files to use the new image and runtime, and patches to the startup script to ensure hardware acceleration is correctly enabled and configured. These changes are aimed at improving video transcoding performance by leveraging NVIDIA GPUs.

NVIDIA GPU-accelerated Docker build and deployment:

  • Added a new Dockerfile.nvidia that builds ffmpeg with CUDA/NVENC support and sets up a multi-stage Docker image for Stremio with NVIDIA runtime compatibility.
  • Updated compose.yaml and added compose.simple.yaml to use the new NVIDIA-enabled Dockerfile, set up the container to run with the NVIDIA runtime, and configured environment variables and resource constraints for GPU usage. [1] [2]

Hardware acceleration and compatibility improvements:

  • Modified stremio-web-service-run.sh to patch server.js at runtime, ensuring NVENC hardware acceleration is always enabled, disables the unreliable auto-test, and fixes compatibility issues for 10-bit video and CPU scaling with NVIDIA GPUs.
  • Improved handling of the SERVER_URL variable in stremio-web-service-run.sh for more robust URL formatting.

lgldsilva and others added 11 commits March 4, 2026 00:01
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Dockerfile.nvidia: CUDA 12.2 base with ffmpeg compiled with
  --enable-nvenc --enable-nvdec --enable-cuvid --enable-cuda-nvcc
- compose.yaml: nvidia runtime, GPU device reservation
- Compatible with GTX 1070 (compute 6.1) and newer

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Dockerfile.nvidia: add nv-codec-headers, nvccflags for compute_52+,
  nginx user, libwebpmux3 runtime dependency
- compose.yaml: add CPU/memory limits, extra_hosts for LAN resolution
- compose.simple.yaml: fix build config (context + dockerfile)
- stremio-web-service-run.sh: POSIX shell fix (dash compat),
  NVENC watcher to override Stremio auto-test that disables hw accel
- NVIDIA-GPU.md: comprehensive setup/rebuild/troubleshooting docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Stremio hw accel auto-test takes ~130 iterations (~2m12s) to complete.
Previous 90s watcher expired before the test finished, leaving
transcodeHardwareAccel as false. Extended to 360s with diagnostic logging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of watching/racing the auto-test (which always fails due to
0.2s sample + concurrency race), patch server.js before node starts:
set initialDetection=false so the test never runs, and pre-configure
NVENC settings directly in server-settings.json.

This ensures the node process reads transcodeHardwareAccel:true and
transcodeProfile:nvenc-linux at startup, using GPU from first request.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
so the broken auto-test can never disable GPU transcoding. Combined with
pre-configured NVENC settings, the node process now uses GPU from start.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GTX 1070 (Pascal) doesn't support 10-bit H.264 NVENC encoding.
The original profile used -hwaccel_output_format cuda + scale_cuda
which kept frames in 10-bit CUDA memory, causing encoder failure.

New approach (hybrid HW decode + CPU scale + HW encode):
- Remove -hwaccel_output_format cuda, -init_hw_device, -filter_hw_device
- Replace scale_cuda with CPU scale (lanczos) - ffmpeg auto-downloads
  CUDA frames from hevc_cuvid decoder to system memory
- Disable wrapSwFilters (hwdownload/hwupload_cuda not needed)
- Keep h264_nvenc encoder (accepts system memory frames)

Verified: GPU at 15% util, 868MiB VRAM, P2 power state during transcode.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CPU: 2.0 → 1.5 cores (video encode now on GPU, CPU only for audio/decode)
Memory: 2G → 1.5G (peak 1.35G during transcode)
Reservation: 512M → 256M (idle ~200MB)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrote NVIDIA-GPU.md (241 → 605 lines) to serve as the definitive reference:

- Added transcoding pipeline diagram (CUVID → CPU scale → NVENC)
- Added detailed 'What did NOT work' section with 5 failed approaches:
  1. File watcher to revert settings (doesn't fix in-memory state)
  2. Patching initialDetection flag (callback path still runs)
  3. scale_cuda=format=nv12 (not supported in ffmpeg 4.4.x)
  4. hwdownload,format=nv12 (format mismatch with p010le)
  5. Pre-setting server-settings.json (auto-test overwrites)
- Added NVENC capability matrix by GPU generation
- Added server.js webpack bundle line references
- Added resource consumption breakdown during transcoding
- Added diagnostic commands reference
- Updated resource limits section with measured values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant