Command '['ninja', '-v']' returned non-zero exit status 1. #8

hengfei-wang · 2024-06-13T08:49:26Z

Hi,

Thank you for the code.

I am using A100 on a cluster without root privileges. When I install the env, I got the error. Here is more info:

Traceback (most recent call last): File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build subprocess.run( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1309, in load return _jit_compile( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile _write_ninja_file_and_build_library( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library _run_ninja_build( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'upfirdn2d_plugin': [1/3] /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/bin/x86_64-conda-linux-gnu-c++ -MMD -MF upfirdn2d.o.d -DTORCH_EXTENSION_NAME=upf irdn2d_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /bask/projects/c/changhj-train-dnn/miniconda 3/envs/live3d/lib/python3.8/site-packages/torch/include -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /bask/ projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/include/TH -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/inclu de/THC -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/include -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -st d=c++17 -c /bask/homes/h/hxw080/.cache/torch_extensions/py38_cu121/upfirdn2d_plugin/38e3583dc1ab1679d4c3a2df8d208521-nvidia-a100-sxm4-40gb/upfirdn2d.cpp -o upfirdn2d.o FAILED: upfirdn2d.o

It seems that it cannot compile upfirdn2d_plugin with ninja. Did you have the same problem before? How to solve it? Any help will be appreciated.

The text was updated successfully, but these errors were encountered:

Dong142857 · 2024-06-13T10:10:28Z

Hi, you can try "pip install ninja". I'm not entirely sure, but I remember it should work. Refer to this link.

hengfei-wang · 2024-06-20T08:15:53Z

Hi,

Sorry for late reply.

Thank you for the suggestion. But it seems not work. I can install ninja successfully while it still requires root privileges to compile some torch extensions. The stylegan repo said they use some self-customized cude extensions and some issues pointed this out. But they do not give any good solution. I guess it is hard to run it on a cluster without root account.

Best regards

Dong142857 · 2024-06-20T13:31:39Z

I'm not familiar with distributed cluster. I do not know the differences between running on cluster and running on single server. Sorry to have no ability to solve your problems. However, I believe this project can run on a single server without root privileges. All my experience are trained on single V100 and I also have no root privileges.

hengfei-wang · 2024-06-21T02:40:06Z

Oh, if in that case, could you kindly send me the environment file from conda using conda env export?

Dong142857 · 2024-06-21T04:05:17Z

Leave your email, and i will send it to you.

hengfei-wang · 2024-06-21T08:24:42Z

[email protected]

Thanks:)

Dong142857 · 2024-06-21T11:56:14Z

File is sent, please let me know if you receive it.

hengfei-wang · 2024-06-21T13:47:39Z

I have received the env file. Thanks!

I will test it soon.

hengfei-wang · 2024-06-24T11:09:39Z

Hi,

I finally solved this problem. It is related to cuda installation. The cuda installed with cluster does not have some files. I reload a cuda module from pre-installed modules in cluster. Then the cuda extensions could be compiled successfully.

Thank you for your help anyway.

hengfei-wang · 2024-06-25T07:57:03Z

Hi,

I also have some questions about the pretrained models.

What's the difference between encoder_render.pt, encoder_render_normal_140000.pt? Is encoder_render.pt trained with less epoches? And what is model_ir_se50.pth?

Dong142857 · 2024-06-25T08:53:06Z

‘encoder_render.pt’ was my first version implement, which trained in small range of angle(A bug that i forgot to multiply 2 in function gen_rand_pose ). This can also synthesize multi-view images but meet some blur in side face. And I fixed it in ' encoder_render_normal_140000.pt', which performed better than the first version.

Dong142857 · 2024-06-25T08:56:20Z

model_ir_se50.pth is resnet which is used to calculate the id loss. this code is from triplanenet.

hengfei-wang · 2024-06-26T03:12:42Z

Thanks!

I am going to close this issue. I am trying to train the model on my dataset. I will open another issue if I have other questions.

hengfei-wang closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command '['ninja', '-v']' returned non-zero exit status 1. #8

Command '['ninja', '-v']' returned non-zero exit status 1. #8

hengfei-wang commented Jun 13, 2024

Dong142857 commented Jun 13, 2024

hengfei-wang commented Jun 20, 2024

Dong142857 commented Jun 20, 2024

hengfei-wang commented Jun 21, 2024

Dong142857 commented Jun 21, 2024

hengfei-wang commented Jun 21, 2024

Dong142857 commented Jun 21, 2024

hengfei-wang commented Jun 21, 2024

hengfei-wang commented Jun 24, 2024

hengfei-wang commented Jun 25, 2024

Dong142857 commented Jun 25, 2024 •

edited

Loading

Dong142857 commented Jun 25, 2024 •

edited

Loading

hengfei-wang commented Jun 26, 2024

Command '['ninja', '-v']' returned non-zero exit status 1. #8

Command '['ninja', '-v']' returned non-zero exit status 1. #8

Comments

hengfei-wang commented Jun 13, 2024

Dong142857 commented Jun 13, 2024

hengfei-wang commented Jun 20, 2024

Dong142857 commented Jun 20, 2024

hengfei-wang commented Jun 21, 2024

Dong142857 commented Jun 21, 2024

hengfei-wang commented Jun 21, 2024

Dong142857 commented Jun 21, 2024

hengfei-wang commented Jun 21, 2024

hengfei-wang commented Jun 24, 2024

hengfei-wang commented Jun 25, 2024

Dong142857 commented Jun 25, 2024 • edited Loading

Dong142857 commented Jun 25, 2024 • edited Loading

hengfei-wang commented Jun 26, 2024

Dong142857 commented Jun 25, 2024 •

edited

Loading

Dong142857 commented Jun 25, 2024 •

edited

Loading