Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support #31

Open
KiaraGrouwstra opened this issue Feb 20, 2020 · 12 comments
Open

GPU support #31

KiaraGrouwstra opened this issue Feb 20, 2020 · 12 comments
Labels
synthesis features needed for program synthesis

Comments

@KiaraGrouwstra
Copy link
Owner

with deep learning libraries working, the next step would be to ensure I can use GPUs with them as well for performance.

@KiaraGrouwstra KiaraGrouwstra added the synthesis features needed for program synthesis label Feb 22, 2020
@KiaraGrouwstra
Copy link
Owner Author

as I've had Cuda 10 installed, as per HaskTorch's readme this is just a matter of nix-shell --arg cudaVersion 10.

@KiaraGrouwstra
Copy link
Owner Author

I issued the command. this involves only multiple complex technologies. what could possibly go wrong?!

@KiaraGrouwstra
Copy link
Owner Author

it built. I feel so confused.

@KiaraGrouwstra
Copy link
Owner Author

KiaraGrouwstra commented Apr 18, 2020

adding a simple test:

    it "gpu" $ do
        putStrLn $ "availableDevices: " <> show availableDevices
        dev <- getDevice
        putStrLn $ "dev: " <> show dev
        let t = D.toCUDA $ D.asTensor $ [1,2,3::Int]
        putStrLn $ "t: " <> show t
        False `shouldBe` True

(or even just running D.toCUDA $ D.asTensor $ [1,2,3::Int] in a repl)
errors with:

gpu:                            FAIL
    uncaught exception: CppException
    CppStdException "Exception: CUDA error: CUDA driver version is insufficient for CUDA runtime version (getDevice at ../../c10/cuda/impl/CUDAGuardImpl.h:37)
    frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f5bfbf6392a in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libc10.so)
    frame #1: <unknown function> + 0x12824 (0x7f5baf65d824 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libc10_cuda.so)
    frame #2: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) + 0x5c2 (0x7f5bb42d7492 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #3: <unknown function> + 0x4432896 (0x7f5bb466f896 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #4: <unknown function> + 0x611c785 (0x7f5bb6359785 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #5: <unknown function> + 0x449b00f (0x7f5bb46d800f in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #6: dist/build/synthesis-test-suite/synthesis-test-suite() [0xbdbb6c]
    frame #7: dist/build/synthesis-test-suite/synthesis-test-suite() [0xb420f6]
    frame #8: dist/build/synthesis-test-suite/synthesis-test-suite() [0xbd7166]
    ; type: c10::Error"
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

edit: same error on GCE. turns out this was because I hadn't added the GPU to the instance yet!

$ cabal v1-repl synthesizer
> import qualified Torch.Tensor                  as D
> D.toCUDA . D.asTensor $ [1,2,3::Int]
*** Exception: CppStdException "Exception: CUDA error: CUDA driver version is insufficient for CUDA runtime version (getDevice at ../../c10/cuda/impl/CUDAGuardImpl.h:37)\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f08ef01e92a in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libc10.so)\nframe #1: <unknown function> + 0x12824 (0x7f08a2c2f824 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libc10_cuda.so)\nframe #2: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) + 0x5c2 (0x7f08a7392492 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #3: <unknown function> + 0x4432896 (0x7f08a772a896 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #4: <unknown function> + 0x611c785 (0x7f08a9414785 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #5: <unknown function> + 0x449b00f (0x7f08a779300f in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #6: std::result_of<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const::{lambda(c10::DispatchTable const&)#1} (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const::{lambda(c10::DispatchTable const&)#1}>(c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const::{lambda(c10::DispatchTable const&)#1}&&) const + 0x670 (0x7f08f03a2188 in /nix/store/mimlx4xlc03ljdhmvvhg5qp3f7c5mg9x-libtorch-ffi-1.4.0.0/lib/ghc-8.8.3/x86_64-linux-ghc-8.8.3/libHSlibtorch-ffi-1.4.0.0-8gX5fG7bhe06G0DWaoPWfH-ghc8.8.3.so)\nframe #7: inline_c_Torch_Internal_Unmanaged_Type_Tensor_52 + 0x447 (0x7f08f02e302b in /nix/store/mimlx4xlc03ljdhmvvhg5qp3f7c5mg9x-libtorch-ffi-1.4.0.0/lib/ghc-8.8.3/x86_64-linux-ghc-8.8.3/libHSlibtorch-ffi-1.4.0.0-8gX5fG7bhe06G0DWaoPWfH-ghc8.8.3.so)\nframe #8: <unknown function> + 0x1093d39 (0x7f08f02cdd39 in /nix/store/mimlx4xlc03ljdhmvvhg5qp3f7c5mg9x-libtorch-ffi-1.4.0.0/lib/ghc-8.8.3/x86_64-linux-ghc-8.8.3/libHSlibtorch-ffi-1.4.0.0-8gX5fG7bhe06G0DWaoPWfH-ghc8.8.3.so)\n; type: c10::Error"
^D
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

@KiaraGrouwstra
Copy link
Owner Author

if the cu101-1.4 means like 10.1, mine seems higher?
searching around, this could actually be about the nvidia driver version instead.

@KiaraGrouwstra
Copy link
Owner Author

KiaraGrouwstra commented Apr 18, 2020

HaskTorch source mentions version 10.1.243 somewhere, but I think 10.1 is the important part.

I now asked in Slack:

I'm on Arch Linux, and just tried using GPU with HaskTorch. This errored on a version mismatch, with HaskTorch expecting CUDA 10.1, while Arch currently distributes 10.2, whereas nVidia itself mostly distributes for more enterprise-y distributions. Does anyone have experience with this? Alternatively, could it be viable to allow HaskTorch to use CUDA 10.2?

Junji Hashimoto:

Basically, you need to install not cuda 10.1 but nvidia-driver. (You don't have to install cuda 10.1.) If prebuild libtorch supports your GPU, I think it works.
libtorch-1.4 requires compute capability >= 3.5.

That compute capability could be it.
Obviously my GeForce GTX 860M has compute capability 3.0/5.0(**) as:

(**) The GeForce GTX860 and GTX870 come in two versions depending on the SKU, please check with your OEM to determine which one is in your system.

And unlike for Windows, obviously there doesn't seem to be an easy CLI way to confirm that. ugh.

I could maybe try LibTorch without HaskTorch as Junji suggested, but this actually sounds like a plausible explanation, seeing as my drivers should be recent enough (?).

Alternatively, I could retry this later on Google Cloud or something.

@KiaraGrouwstra
Copy link
Owner Author

KiaraGrouwstra commented Apr 26, 2020

GCE's Tesla T4 has compute capability 7.5, so def >= libtorch-1.4's required 3.5.

in PyTorch CUDA works fine too:

$ pip install torch==1.4.0 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already satisfied: torch==1.4.0 in /opt/conda/lib/python3.7/site-packages (1.4.0)
$ python
>>> import torch
>>> import numpy as np
>>> a = np.array([1, 2, 3])
>>> t = torch.as_tensor(a)
>>> t.cuda()
tensor([1, 2, 3], device='cuda:0')

@KiaraGrouwstra
Copy link
Owner Author

KiaraGrouwstra commented Apr 26, 2020

on Slack Junji mentioned exposing GPU stuff thru Nix using nixGL.
Now, checking driver version their way fails:

$ glxinfo | grep NVIDIA
-bash: glxinfo: command not found

nvidia-smi however reveals my local driver version is 440.64.
On GCE on first boot the VM reported it was installing nVidia drivers of version 418.87.01.
However, ./nvidiaInstall.py 418.87.01 nixGLNvidia produces a 404-ing url.
Different versions are available tho seemingly not this exact one. surrounding available versions are 418.74 and 418.88, of which the latter went thru.

prepending the nix-shell command with nixGLNvidia then makes it work, e.g. nixGLNvidia nix-shell --arg cudaVersion 10.

@KiaraGrouwstra
Copy link
Owner Author

KiaraGrouwstra commented Apr 26, 2020

will this same strategy work for me locally?!

edit: yes!

@KiaraGrouwstra
Copy link
Owner Author

progress switching my code to support CUDA: cuda, device.

@KiaraGrouwstra
Copy link
Owner Author

run-time crap on device branch:

all LSTM calls using CUDA suddenly emit the following error:

Warning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (_cudnn_impl at ../../aten/src/ATen/native/cudnn/RNN.cpp:1266)

that flatten_parameters() sounds like a mutating type of method that would definitely seem different in HaskTorch. hm.
the error seems mentioned at e.g. 1, 2, 3, 4.
one implied maybe there is no real underlying issue, but the next issue if related might imply otherwise.

after some R3NN call (scores printed), the program inevitably ends up crashing with this:

fish: “cabal v1-run synthesizer -- --e…” terminated by signal SIGSEGV (Address boundary error)

this didn't happen when everything was just on CPU.
the weird thing is, the first action after that printing statement should be printing the same thing again from NSPS train i.e. outside the R3NN function. so it's a mystery to me exactly what op could be going wrong there.

basically, GPU still seems buggy?

@KiaraGrouwstra
Copy link
Owner Author

just retried now. seems to be working! just 10% faster tho against 4x the cost.
implementing parallelizing over task functions should fix that but I don't properly understand this yet. gonna leave this out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
synthesis features needed for program synthesis
Projects
None yet
Development

No branches or pull requests

1 participant