-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command '['ninja', '-v']' returned non-zero exit status 1. #8
Comments
Hi, you can try "pip install ninja". I'm not entirely sure, but I remember it should work. Refer to this link. |
Hi, Sorry for late reply. Thank you for the suggestion. But it seems not work. I can install ninja successfully while it still requires root privileges to compile some torch extensions. The stylegan repo said they use some self-customized cude extensions and some issues pointed this out. But they do not give any good solution. I guess it is hard to run it on a cluster without root account. Best regards |
I'm not familiar with distributed cluster. I do not know the differences between running on cluster and running on single server. Sorry to have no ability to solve your problems. However, I believe this project can run on a single server without root privileges. All my experience are trained on single V100 and I also have no root privileges. |
Oh, if in that case, could you kindly send me the environment file from conda using |
Leave your email, and i will send it to you. |
Thanks:) |
File is sent, please let me know if you receive it. |
I have received the env file. Thanks! I will test it soon. |
Hi, I finally solved this problem. It is related to cuda installation. The cuda installed with cluster does not have some files. I reload a cuda module from pre-installed modules in cluster. Then the cuda extensions could be compiled successfully. Thank you for your help anyway. |
Hi, I also have some questions about the pretrained models. What's the difference between encoder_render.pt, encoder_render_normal_140000.pt? Is encoder_render.pt trained with less epoches? And what is model_ir_se50.pth? |
‘encoder_render.pt’ was my first version implement, which trained in small range of angle(A bug that i forgot to multiply 2 in function gen_rand_pose ). This can also synthesize multi-view images but meet some blur in side face. And I fixed it in ' encoder_render_normal_140000.pt', which performed better than the first version. |
model_ir_se50.pth is resnet which is used to calculate the id loss. this code is from triplanenet. |
Thanks! I am going to close this issue. I am trying to train the model on my dataset. I will open another issue if I have other questions. |
Hi,
Thank you for the code.
I am using A100 on a cluster without root privileges. When I install the env, I got the error. Here is more info:
Traceback (most recent call last): File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build subprocess.run( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1309, in load return _jit_compile( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1719, in _jit_compile _write_ninja_file_and_build_library( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1832, in _write_ninja_file_and_build_library _run_ninja_build( File "/bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'upfirdn2d_plugin': [1/3] /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/bin/x86_64-conda-linux-gnu-c++ -MMD -MF upfirdn2d.o.d -DTORCH_EXTENSION_NAME=upf irdn2d_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /bask/projects/c/changhj-train-dnn/miniconda 3/envs/live3d/lib/python3.8/site-packages/torch/include -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /bask/ projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/include/TH -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/lib/python3.8/site-packages/torch/inclu de/THC -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/include -isystem /bask/projects/c/changhj-train-dnn/miniconda3/envs/live3d/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -st d=c++17 -c /bask/homes/h/hxw080/.cache/torch_extensions/py38_cu121/upfirdn2d_plugin/38e3583dc1ab1679d4c3a2df8d208521-nvidia-a100-sxm4-40gb/upfirdn2d.cpp -o upfirdn2d.o FAILED: upfirdn2d.o
It seems that it cannot compile upfirdn2d_plugin with ninja. Did you have the same problem before? How to solve it? Any help will be appreciated.
The text was updated successfully, but these errors were encountered: