ShichenLiu · PhDinTimeManagement · May 9, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
diff --git a/.idea/.gitignore b/.idea/.gitignore
diff --git a/.idea/SoftRas-mod.iml b/.idea/SoftRas-mod.iml
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/LICENSE b/LICENSE
@@ -3,6 +3,7 @@ MIT License
 Copyright (c) 2017 Hiroharu Kato
 Copyright (c) 2018 Nikos Kolotouros
 Copyright (c) 2019 Shichen Liu
+Copyright (c) 2026 Xin Dai
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/MODIFIED_FILES.txt b/MODIFIED_FILES.txt
@@ -0,0 +1,26 @@
+setup.py
+soft_renderer/cuda/create_texture_image_cuda.cpp
+soft_renderer/cuda/create_texture_image_cuda_kernel.cu
+soft_renderer/cuda/load_textures_cuda.cpp
+soft_renderer/cuda/load_textures_cuda_kernel.cu
+soft_renderer/cuda/soft_rasterize_cuda.cpp
+soft_renderer/cuda/soft_rasterize_cuda_kernel.cu
+soft_renderer/cuda/voxelization_cuda.cpp
+soft_renderer/cuda/voxelization_cuda_kernel.cu
+soft_renderer/mesh.py
+soft_renderer/transform.py
+soft_renderer/lighting.py
+soft_renderer/functional/ambient_lighting.py
+soft_renderer/functional/directional_lighting.py
+soft_renderer/functional/load_obj.py
+soft_renderer/functional/look.py
+soft_renderer/functional/look_at.py
+soft_renderer/functional/save_obj.py
+soft_renderer/functional/soft_rasterize.py
+soft_renderer/functional/vertex_normals.py
+soft_renderer/functional/voxelization.py
+examples/demo_deform.py
+examples/recon/models.py
+examples/recon/models_large.py
+examples/recon/test.py
+examples/recon/train.py
diff --git a/README.md b/README.md
@@ -1,121 +1,118 @@
-# Soft Rasterizer (SoftRas)
+# Soft Rasterizer (SoftRas) — Modern CUDA/PyTorch Compatibility Fork
 
-This repository contains the code (in PyTorch) for "[Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning](https://arxiv.org/abs/1904.01786)" (ICCV'2019 Oral) by [Shichen Liu](https://shichenliu.github.io), [Tianye Li](https://sites.google.com/site/tianyefocus/home), [Weikai Chen](http://chenweikai.github.io/) and [Hao Li](https://www.hao-li.com/Hao_Li/Hao_Li_-_about_me.html).
+![SoftRas Fork: Modern CUDA Compatibility](https://img.shields.io/badge/SoftRas%20Fork-Modern%20CUDA%20Compatibility-brown.svg)
 
-## Contents
+This repository is a compatibility-focused fork of the original **Soft Rasterizer (SoftRas)** PyTorch project. It preserves the original SoftRas algorithm, examples, license, and academic attribution while updating the codebase for modern CUDA/PyTorch extension builds and newer NVIDIA GPUs, including RTX 5090-class hardware when the local CUDA/PyTorch toolchain supports Blackwell / compute capability 12.0.
 
-1. [Introduction](#introduction)
-2. [Usage](#usage)
-3. [Applications](#applications)
-4. [Contacts](#contacts)
+> This fork does **not** claim original ownership of SoftRas and does **not** introduce a new differentiable rendering method. The core renderer, examples, and research contribution belong to the original SoftRas authors.
 
-## Introduction
+## Original Project Credit
 
-Soft Rasterizer (SoftRas) is a truly differentiable renderer framework with a novel formulation that views rendering as a **differentiable aggregating process** that fuses **probabilistic contributions** of all mesh triangles with respect to the rendered pixels. Thanks to such *"soft"* formulation, our framework is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes (color, normal, etc.) from various forms of image representations, including silhouette, shading and color images. 
+SoftRas was introduced in:
 
-<img src="https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/teaser/teaser.png" width="60%">
+> **Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning**  
+> Shichen Liu, Tianye Li, Weikai Chen, Hao Li  
+> ICCV 2019 Oral
 
-## Usage
+Original project: [ShichenLiu/SoftRas](https://github.com/ShichenLiu/SoftRas)  
+Paper: [arXiv:1904.01786](https://arxiv.org/abs/1904.01786)
 
-The code is built on Python3 and PyTorch 1.6.0. CUDA (10.1) is needed in order to install the module. Our code is extended on the basis of [this repo](https://github.com/daniilidis-group/neural_renderer). `6/3/2021` update note: we add **testing models** and **recontructed color meshes** below, and also slightly optimized the code structure! Previous version is archived in the `legacy` branch.
+The original code was extended from [hiroharu-kato/neural_renderer](https://github.com/hiroharu-kato/neural_renderer?tab=readme-ov-file) and [daniilidis-group/neural_renderer](https://github.com/daniilidis-group/neural_renderer). <br>
+The included `LICENSE` remains the MIT License and includes the original copyright notices.
 
+## Fork Maintainer
+[Chloe Xin DAI](https://github.com/PhDinTimeManagement) <br>
 
-To install the module, using
+## Motivation
 
-```
-python setup.py install
-```
+The upstream SoftRas README targets **PyTorch 1.6.0** and **CUDA 10.1**, which are older than modern Blackwell-generation GPUs. [NVIDIA lists](https://developer.nvidia.com/cuda/gpus) the GeForce RTX 5090 under **CUDA compute capability 12.0**, so an older CUDA/PyTorch stack cannot reliably build or run device code for this GPU.
 
-## Applications
+In addition to the GPU architecture issue, the original source used several older PyTorch/CUDA idioms that are brittle in current extension builds, including hard-coded `.cuda()` calls, deprecated autograd patterns, implicit CUDA stream usage, limited extension input validation, and ambiguous `torch.cross` calls. This fork updates those areas while keeping the original CUDA kernel logic largely intact.
 
-### 0. Rendering
+## Key Modifications
 
-We demonstrate the rendering effects provided by SoftRas. Realistic rendering results (1st and 2nd columns) can be achieved with a proper setting of `sigma` and `gamma`. With larger `sigma` and `gamma`, one can obtain renderings with stronger transparency and blurriness (3rd and 4th column).
+The exact changed files are recorded in `MODIFIED_FILES.txt`, and the most important changes are summarized below.
 
-![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/render/forward.gif)
+| Area | Main Files                                                                                          | Summary                                                                                                                                                                                                   | Purpose                                                                                                                |
+| --- |-----------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
+| Build/package configuration | `setup.py`                                                                                          | Kept `torch.utils.cpp_extension.CUDAExtension`, switched package discovery to `find_packages(...)`, removed the unused `torchvision` install requirement, and set `zip_safe=False`.                       | Keeps the extension build path simple while avoiding an unnecessary dependency and improving package discovery.        |
+| C++ extension wrappers | `soft_renderer/cuda/*_cuda.cpp`                                                                     | Replaced broad `torch/torch.h` includes with `torch/extension.h`, added explicit CUDA/contiguity/dtype/device checks, and added `c10::cuda::CUDAGuard`.                                                   | Fails early on invalid tensors and ensures kernels launch on the CUDA device associated with the input tensors.        |
+| CUDA kernel launch path | `soft_renderer/cuda/*_kernel.cu`                                                                    | Added current-stream launches via `at::cuda::getCurrentCUDAStream()` and replaced manual `cudaGetLastError()` printing with `C10_CUDA_KERNEL_LAUNCH_CHECK()`.                                             | Makes custom kernels follow PyTorch's active CUDA stream and report launch errors as proper CUDA/PyTorch errors.       |
+| Tensor dtype/device handling | `mesh.py`, `load_obj.py`, `save_obj.py`, `voxelization.py`, `soft_rasterize.py`, lighting utilities | Replaced hard-coded CUDA allocation patterns with device-aware `.to(device)` flows, dtype-aware tensor creation, explicit `int32` voxel buffers, and `.contiguous()` calls before extension entry points. | Prevents CPU/GPU mismatches, dtype mismatches, and non-contiguous tensor issues that can break native extension calls. |
+| Deprecated PyTorch API usage | `examples/recon/*.py`, `examples/demo_deform.py`, `save_obj.py`                                     | Replaced `.data` with `.detach()` or `.item()`, removed `torch.autograd.Variable`, and added `map_location=device` when loading checkpoints.                                                              | Aligns example/runtime code with modern PyTorch autograd and checkpoint-loading behavior.                              |
+| Math/API cleanup | `look.py`, `look_at.py`, `vertex_normals.py`, `mesh.py`, `transform.py`                             | Added explicit `dim=` arguments for `torch.cross` / `F.normalize`, fixed tensor conversion to preserve device and dtype, fixed `LookAt.eyes` / `Look.eyes`, and handled `look(..., up=None)`.             | Avoids modern PyTorch warnings/errors and fixes device/dtype inconsistencies in transformation code.                   |
 
-```
-CUDA_VISIBLE_DEVICES=0 python examples/demo_render.py
-```
+## Compatibility Notes
 
-### 1. 3D Unsupervised Single-view Mesh Reconstruction
+- The renderer and voxelizer still rely on custom CUDA extensions. There is **no CPU implementation** of the main SoftRas rasterization kernels.
+- For RTX 5090 / Blackwell GPUs, use a PyTorch CUDA build and CUDA toolkit/driver combination that supports compute capability **12.0**.
+- This repository does not hard-code a complete version matrix. The exact working combination depends on your Python, PyTorch, CUDA toolkit, NVIDIA driver, compiler, and GPU architecture.
+- The CUDA kernels are primarily compatibility-modernized, not algorithmically redesigned or performance-retuned.
+- Rebuild the extension after changing PyTorch, Python, CUDA, compiler, or GPU architecture.
 
-By incorporating SoftRas with a simple mesh generator, one can train the network with multi-view images only, without requiring any 3D supervision. At test time, one can reconstruct the 3D mesh, along with the mesh texture, from a single RGB image. Below we show the results of single-view mesh reconstruction on ShapeNet.
+## Installation
 
-![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/recon_rgb/02691156_150_000867_w_input.gif) ![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/recon_rgb/02958343_150_001155_w_input.gif) ![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/recon_rgb/03001627_150_002475_w_input.gif) ![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/recon_rgb/04090263_150_004755_w_input.gif) ![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/recon_rgb/04256520_150_004035_w_input.gif) ![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/recon_rgb/04379243_150_000843_w_input.gif)
+`setup.py` imports PyTorch at build time, so install a CUDA-enabled PyTorch build before installing this package.
 
-Download shapenet rendering dataset provided by NMR:
-```
-bash examples/recon/download_dataset.sh
-```
+Suggested setup:
 
-To train the model:
-```
-CUDA_VISIBLE_DEVICES=0 python examples/recon/train.py -eid recon
-```
+```bash
+# 1. Create and activate an environment.
 
-To test the model:
-```
-CUDA_VISIBLE_DEVICES=0 python examples/recon/test.py -eid recon \
-    -d 'data/results/models/recon/checkpoint_0200000.pth.tar'
-```
+# 2. Install build helpers.
+python -m pip install --upgrade pip setuptools wheel ninja
 
-We also provide our trained model here:
-- SoftRas trained with silhouettes supervision (62+ IoU): [google drive](https://drive.google.com/file/d/1GlZJVih5BMGp026mpxK2scWJXqT94VUx/view?usp=sharing)
-- SoftRas trained with shading supervision (64+ IoU, test with `--shading-model` arg): [google drive](https://drive.google.com/file/d/1r63AKNn3ecMho6RFE7gFefRv78Pmbe5h/view?usp=sharing)
-- SoftRas reconstructed meshes with color (random sampled): [google drive](https://drive.google.com/file/d/1gnSshn0k9JpVpoSTWIQoV2QFAlin3AUK/view?usp=sharing)
+# 3. Install a CUDA-enabled PyTorch build that supports your CUDA/driver/GPU stack.
+#    Use the official PyTorch selector for the command appropriate to your system:
+#    https://pytorch.org/get-started/locally/
 
-### 2. Image-based Shape Deformation
+# 4. For RTX 5090 builds, make the target architecture explicit.
+#    This is especially useful when building without the target GPU visible.
+export TORCH_CUDA_ARCH_LIST="12.0"
 
-SoftRas provides strong supervision for image-based mesh deformation. We visualize the deformation process from a sphere to a car model and then to a plane given supervision from multi-view silhouette images.
+# Optional: set CUDA_HOME if your CUDA toolkit is not auto-detected.
+# export CUDA_HOME=/usr/local/cuda
 
-![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/deform/sphere_to_car.gif) ![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/deform/car_to_plane.gif)
-
-```
-CUDA_VISIBLE_DEVICES=0 python examples/demo_deform.py
+# 5. Build and install SoftRas in editable mode.
+python -m pip install -e .
 ```
 
-The optimized mesh is included in `data/obj/plane/plane.obj`
-
-### 3. Pose Optimization for Rigid Objects
-
-With scheduled blurry renderings, one can obtain smooth energy landscape that avoids local minima. 
-Below we demonstrate how a color cube is fitted to the target image in the presence of large occlusions.
-The blurry rendering and the corresponding rendering losses are shown in the 3rd and 4th columns respectively.
-
-![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/fitting/cubic.gif)
+Dependencies declared by `setup.py` are:
 
-### 4. Non-rigid Shape Fitting
+```text
+numpy
+torch
+scikit-image
+tqdm
+imageio
+```
 
-We fit the parametric body model (SMPL) to a target image where the part (right hand) is entirely occluded in the input view.
+Some example scripts also import packages that are not declared in `setup.py`, such as `matplotlib`; install those separately if your selected example requires them.
 
-![](https://raw.githubusercontent.com/ShichenLiu/SoftRas/master/data/media/demo/fitting/smpl.gif)
+## Build Verification
 
-## Contacts
-Shichen Liu: <liushichen95@gmail.com>
+After installation, verify that PyTorch sees your CUDA device and that the SoftRas package imports successfully:
 
-Any discussions or concerns are welcomed!
+```bash
+python - <<'PY'
+import torch
+import soft_renderer as sr
 
-### Citation
+print("PyTorch:", torch.__version__)
+print("PyTorch CUDA:", torch.version.cuda)
+print("CUDA available:", torch.cuda.is_available())
 
-If you find our project useful in your research, please consider citing:
+if torch.cuda.is_available():
+    print("GPU:", torch.cuda.get_device_name(0))
+    print("Capability:", torch.cuda.get_device_capability(0))
 
-```
-@article{liu2019softras,
-  title={Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning},
-  author={Liu, Shichen and Li, Tianye and Chen, Weikai and Li, Hao},
-  journal={The IEEE International Conference on Computer Vision (ICCV)},
-  month = {Oct},
-  year={2019}
-}
+print("SoftRas:", sr.__version__)
+PY
 ```
 
-```
-@article{liu2020general,
-  title={A General Differentiable Mesh Renderer for Image-based 3D Reasoning},
-  author={Liu, Shichen and Li, Tianye and Chen, Weikai and Li, Hao},
-  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
-  year={2020},
-  publisher={IEEE}
-}
+If you change CUDA/PyTorch versions or see stale binary errors, clean and rebuild:
+
+```bash
+rm -rf build *.egg-info soft_renderer/cuda/*.so
+python -m pip install -e .
 ```
diff --git a/examples/demo_deform.py b/examples/demo_deform.py
@@ -74,7 +74,9 @@ def main():
 
     os.makedirs(args.output_dir, exist_ok=True)
 
-    model = Model(args.template_mesh).cuda()
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+
+    model = Model(args.template_mesh).to(device)
     transform = sr.LookAt(viewing_angle=15)
     lighting = sr.Lighting()
     rasterizer = sr.SoftRasterizer(image_size=64, sigma_val=1e-4, aggr_func_rgb='hard')
@@ -92,7 +94,7 @@ def main():
     loop = tqdm.tqdm(list(range(0, 2000)))
     writer = imageio.get_writer(os.path.join(args.output_dir, 'deform.gif'), mode='I')
     for i in loop:
-        images_gt = torch.from_numpy(images).cuda()
+        images_gt = torch.from_numpy(images).to(device)
 
         mesh, laplacian_loss, flatten_loss = model(args.batch_size)
 

diff --git a/examples/recon/models.py b/examples/recon/models.py
@@ -139,7 +139,7 @@ def render_multiview(self, image_a, image_b, viewpoint_a, viewpoint_b):
     def evaluate_iou(self, images, voxels):
         vertices, faces = self.reconstruct(images)
 
-        faces_ = srf.face_vertices(vertices, faces).data
+        faces_ = srf.face_vertices(vertices, faces).detach()
         faces_norm = faces_ * 1. * (32. - 1) / 32. + 0.5
         voxels_predict = srf.voxelization(faces_norm, 32, False).cpu().numpy()
         voxels_predict = voxels_predict.transpose(0, 2, 1, 3)[:, :, :, ::-1]

diff --git a/examples/recon/models_large.py b/examples/recon/models_large.py
@@ -148,7 +148,7 @@ def render_multiview(self, image_a, image_b, viewpoint_a, viewpoint_b):
     def evaluate_iou(self, images, voxels):
         vertices, faces = self.reconstruct(images)
 
-        faces_ = srf.face_vertices(vertices, faces).data
+        faces_ = srf.face_vertices(vertices, faces).detach()
         faces_norm = faces_ * 1. * (32. - 1) / 32. + 0.5
         voxels_predict = srf.voxelization(faces_norm, 32, False).cpu().numpy()
         voxels_predict = voxels_predict.transpose(0, 2, 1, 3)[:, :, :, ::-1]