Skip to content

Training stops at the end when the model is being saved #782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
K0pasz opened this issue Apr 30, 2024 · 2 comments
Open

Training stops at the end when the model is being saved #782

K0pasz opened this issue Apr 30, 2024 · 2 comments

Comments

@K0pasz
Copy link

K0pasz commented Apr 30, 2024

I use this gaussian splatting tool in Google Colab because I do not have enough VRAM (6GB) on my PC (when I ran it on my PC it always stopped with an error that indicated to me that I do not have enough VRAM). The problem shows when I set the iterations "too" high (e.g 7000), then the training process automatically stops when it tries to save the model and the created splat. Furthermore I have seen a "^C" at the output so it looks like that the command terminates itself somehow.

My colab notebook looks like this:

%cd /content
!git clone --recursive https://github.com/graphdeco-inria/gaussian-splatting
!pip install -q plyfile

%cd /content/gaussian-splatting
!pip install -q /content/gaussian-splatting/submodules/diff-gaussian-rasterization
!pip install -q /content/gaussian-splatting/submodules/simple-knn

from google.colab import drive
drive.mount('/content/drive')

!python train.py -s /content/drive/MyDrive/for_nerf_by_sai_cli/colmap -i /content/drive/MyDrive/for_nerf_by_sai_cli/images -m /content/output --iterations 10000

The output:

2024-04-30 10:03:24.108816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-30 10:03:24.108868: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-30 10:03:24.116418: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-30 10:03:24.135188: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-30 10:03:25.983543: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Optimizing /content/output
Output folder: /content/output [30/04 10:03:29]
Reading camera 150/150 [30/04 10:03:29]
Loading Training Cameras [30/04 10:03:29]
Loading Test Cameras [30/04 10:03:33]
Number of points at initialisation :  34739 [30/04 10:03:33]
Training progress:  70% 7000/10000 [09:23<07:06,  7.04it/s, Loss=0.0648267]
[ITER 7000] Evaluating test: L1 0.08124514473112006 PSNR 19.291277433696546 [30/04 10:12:59]

[ITER 7000] Evaluating train: L1 0.044808738678693776 PSNR 22.959835433959963 [30/04 10:13:01]

[ITER 7000] Saving Gaussians [30/04 10:13:01]
^C

I tried to save the output into the connected environment's folder but the issue still remains.
If I run 5000 or less iterations than the output is saved correctly.

@PanagiotisP
Copy link

Since it works at 5000 iterations, when the number of points and thus the memory footprint is smaller, I feel it has to do with Google Colab setting a threshold on the size of files it can save. A check for that could be to disable densification and see if it saves at 7000 as expected.

@GaneshBannur
Copy link

Check this issue #235

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants