Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command '['ninja', '-v']' returned non-zero exit status 1. #8

Closed
hengfei-wang opened this issue Jun 13, 2024 · 13 comments
Closed

Command '['ninja', '-v']' returned non-zero exit status 1. #8

hengfei-wang opened this issue Jun 13, 2024 · 13 comments

Comments

@hengfei-wang
Copy link

Hi,

Thank you for the code.

I am using A100 on a cluster without root privileges. When I install the env, I got the error. Here is more info:

Traceback (most recent call last): File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build subprocess.run( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1309, in load return _jit_compile( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile _write_ninja_file_and_build_library( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library _run_ninja_build( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'upfirdn2d_plugin': [1/3] /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/bin/x86_64-conda-linux-gnu-c++ -MMD -MF upfirdn2d.o.d -DTORCH_EXTENSION_NAME=upf irdn2d_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /bask/projects/c/changhj-train-dnn/miniconda 3/envs/live3d/lib/python3.8/site-packages/torch/include -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /bask/ projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/include/TH -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/inclu de/THC -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/include -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -st d=c++17 -c /bask/homes/h/hxw080/.cache/torch_extensions/py38_cu121/upfirdn2d_plugin/38e3583dc1ab1679d4c3a2df8d208521-nvidia-a100-sxm4-40gb/upfirdn2d.cpp -o upfirdn2d.o FAILED: upfirdn2d.o

It seems that it cannot compile upfirdn2d_plugin with ninja. Did you have the same problem before? How to solve it? Any help will be appreciated.

@Dong142857
Copy link
Owner

Hi, you can try "pip install ninja". I'm not entirely sure, but I remember it should work. Refer to this link.

@hengfei-wang
Copy link
Author

Hi,

Sorry for late reply.

Thank you for the suggestion. But it seems not work. I can install ninja successfully while it still requires root privileges to compile some torch extensions. The stylegan repo said they use some self-customized cude extensions and some issues pointed this out. But they do not give any good solution. I guess it is hard to run it on a cluster without root account.

Best regards

@Dong142857
Copy link
Owner

I'm not familiar with distributed cluster. I do not know the differences between running on cluster and running on single server. Sorry to have no ability to solve your problems. However, I believe this project can run on a single server without root privileges. All my experience are trained on single V100 and I also have no root privileges.

@hengfei-wang
Copy link
Author

Oh, if in that case, could you kindly send me the environment file from conda using conda env export?

@Dong142857
Copy link
Owner

Leave your email, and i will send it to you.

@hengfei-wang
Copy link
Author

[email protected]

Thanks:)

@Dong142857
Copy link
Owner

File is sent, please let me know if you receive it.

@hengfei-wang
Copy link
Author

I have received the env file. Thanks!

I will test it soon.

@hengfei-wang
Copy link
Author

Hi,

I finally solved this problem. It is related to cuda installation. The cuda installed with cluster does not have some files. I reload a cuda module from pre-installed modules in cluster. Then the cuda extensions could be compiled successfully.

Thank you for your help anyway.

@hengfei-wang
Copy link
Author

Hi,

I also have some questions about the pretrained models.

What's the difference between encoder_render.pt, encoder_render_normal_140000.pt? Is encoder_render.pt trained with less epoches? And what is model_ir_se50.pth?

@Dong142857
Copy link
Owner

Dong142857 commented Jun 25, 2024

‘encoder_render.pt’ was my first version implement, which trained in small range of angle(A bug that i forgot to multiply 2 in function gen_rand_pose ). This can also synthesize multi-view images but meet some blur in side face. And I fixed it in ' encoder_render_normal_140000.pt', which performed better than the first version.

@Dong142857
Copy link
Owner

Dong142857 commented Jun 25, 2024

model_ir_se50.pth is resnet which is used to calculate the id loss. this code is from triplanenet.

@hengfei-wang
Copy link
Author

Thanks!

I am going to close this issue. I am trying to train the model on my dataset. I will open another issue if I have other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants