GPU support #31

KiaraGrouwstra · 2020-02-20T07:58:17Z

with deep learning libraries working, the next step would be to ensure I can use GPUs with them as well for performance.

KiaraGrouwstra · 2020-04-17T20:52:58Z

as I've had Cuda 10 installed, as per HaskTorch's readme this is just a matter of nix-shell --arg cudaVersion 10.

KiaraGrouwstra · 2020-04-17T21:02:59Z

I issued the command. this involves only multiple complex technologies. what could possibly go wrong?!

KiaraGrouwstra · 2020-04-17T21:08:32Z

it built. I feel so confused.

KiaraGrouwstra · 2020-04-18T11:34:40Z

adding a simple test:

    it "gpu" $ do
        putStrLn $ "availableDevices: " <> show availableDevices
        dev <- getDevice
        putStrLn $ "dev: " <> show dev
        let t = D.toCUDA $ D.asTensor $ [1,2,3::Int]
        putStrLn $ "t: " <> show t
        False `shouldBe` True

(or even just running D.toCUDA $ D.asTensor $ [1,2,3::Int] in a repl)
errors with:

gpu:                            FAIL
    uncaught exception: CppException
    CppStdException "Exception: CUDA error: CUDA driver version is insufficient for CUDA runtime version (getDevice at ../../c10/cuda/impl/CUDAGuardImpl.h:37)
    frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f5bfbf6392a in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libc10.so)
    frame #1: <unknown function> + 0x12824 (0x7f5baf65d824 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libc10_cuda.so)
    frame #2: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) + 0x5c2 (0x7f5bb42d7492 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #3: <unknown function> + 0x4432896 (0x7f5bb466f896 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #4: <unknown function> + 0x611c785 (0x7f5bb6359785 in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #5: <unknown function> + 0x449b00f (0x7f5bb46d800f in /nix/store/b36p6qkvj4d9hgkhr2353hdgqvcvh49g-libtorch-cu101-1.4/lib/libtorch.so)
    frame #6: dist/build/synthesis-test-suite/synthesis-test-suite() [0xbdbb6c]
    frame #7: dist/build/synthesis-test-suite/synthesis-test-suite() [0xb420f6]
    frame #8: dist/build/synthesis-test-suite/synthesis-test-suite() [0xbd7166]
    ; type: c10::Error"

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

edit: same error on GCE. ~~turns out this was because I hadn't added the GPU to the instance yet!~~

$ cabal v1-repl synthesizer
> import qualified Torch.Tensor                  as D
> D.toCUDA . D.asTensor $ [1,2,3::Int]
*** Exception: CppStdException "Exception: CUDA error: CUDA driver version is insufficient for CUDA runtime version (getDevice at ../../c10/cuda/impl/CUDAGuardImpl.h:37)\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7f08ef01e92a in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libc10.so)\nframe #1: <unknown function> + 0x12824 (0x7f08a2c2f824 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libc10_cuda.so)\nframe #2: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) + 0x5c2 (0x7f08a7392492 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #3: <unknown function> + 0x4432896 (0x7f08a772a896 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #4: <unknown function> + 0x611c785 (0x7f08a9414785 in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #5: <unknown function> + 0x449b00f (0x7f08a779300f in /nix/store/ndhzgviamlsxd5zwv1cy83nh7ak8dbl6-libtorch-cu101-1.4/lib/libtorch.so)\nframe #6: std::result_of<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const::{lambda(c10::DispatchTable const&)#1} (c10::DispatchTable const&)>::type c10::LeftRight<c10::DispatchTable>::read<c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const::{lambda(c10::DispatchTable const&)#1}>(c10::Dispatcher::callUnboxedOnly<at::Tensor, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat> >(c10::OperatorHandle const&, at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const::{lambda(c10::DispatchTable const&)#1}&&) const + 0x670 (0x7f08f03a2188 in /nix/store/mimlx4xlc03ljdhmvvhg5qp3f7c5mg9x-libtorch-ffi-1.4.0.0/lib/ghc-8.8.3/x86_64-linux-ghc-8.8.3/libHSlibtorch-ffi-1.4.0.0-8gX5fG7bhe06G0DWaoPWfH-ghc8.8.3.so)\nframe #7: inline_c_Torch_Internal_Unmanaged_Type_Tensor_52 + 0x447 (0x7f08f02e302b in /nix/store/mimlx4xlc03ljdhmvvhg5qp3f7c5mg9x-libtorch-ffi-1.4.0.0/lib/ghc-8.8.3/x86_64-linux-ghc-8.8.3/libHSlibtorch-ffi-1.4.0.0-8gX5fG7bhe06G0DWaoPWfH-ghc8.8.3.so)\nframe #8: <unknown function> + 0x1093d39 (0x7f08f02cdd39 in /nix/store/mimlx4xlc03ljdhmvvhg5qp3f7c5mg9x-libtorch-ffi-1.4.0.0/lib/ghc-8.8.3/x86_64-linux-ghc-8.8.3/libHSlibtorch-ffi-1.4.0.0-8gX5fG7bhe06G0DWaoPWfH-ghc8.8.3.so)\n; type: c10::Error"
^D
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

KiaraGrouwstra · 2020-04-18T12:05:39Z

if the cu101-1.4 means like 10.1, mine seems higher?
searching around, this could actually be about the nvidia driver version instead.

KiaraGrouwstra · 2020-04-18T12:26:33Z

HaskTorch source mentions version 10.1.243 somewhere, but I think 10.1 is the important part.

I now asked in Slack:

I'm on Arch Linux, and just tried using GPU with HaskTorch. This errored on a version mismatch, with HaskTorch expecting CUDA 10.1, while Arch currently distributes 10.2, whereas nVidia itself mostly distributes for more enterprise-y distributions. Does anyone have experience with this? Alternatively, could it be viable to allow HaskTorch to use CUDA 10.2?

Junji Hashimoto:

Basically, you need to install not cuda 10.1 but nvidia-driver. (You don't have to install cuda 10.1.) If prebuild libtorch supports your GPU, I think it works.
libtorch-1.4 requires compute capability >= 3.5.

That compute capability could be it.
Obviously my GeForce GTX 860M has compute capability 3.0/5.0(**) as:

(**) The GeForce GTX860 and GTX870 come in two versions depending on the SKU, please check with your OEM to determine which one is in your system.

And unlike for Windows, obviously there doesn't seem to be an easy CLI way to confirm that. ugh.

I could maybe try LibTorch without HaskTorch as Junji suggested, but this actually sounds like a plausible explanation, seeing as my drivers should be recent enough (?).

Alternatively, I could retry this later on Google Cloud or something.

KiaraGrouwstra · 2020-04-26T09:08:41Z

GCE's Tesla T4 has compute capability 7.5, so def >= libtorch-1.4's required 3.5.

in PyTorch CUDA works fine too:

$ pip install torch==1.4.0 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already satisfied: torch==1.4.0 in /opt/conda/lib/python3.7/site-packages (1.4.0)
$ python
>>> import torch
>>> import numpy as np
>>> a = np.array([1, 2, 3])
>>> t = torch.as_tensor(a)
>>> t.cuda()
tensor([1, 2, 3], device='cuda:0')

KiaraGrouwstra · 2020-04-26T12:17:32Z

on Slack Junji mentioned exposing GPU stuff thru Nix using nixGL.
Now, checking driver version their way fails:

$ glxinfo | grep NVIDIA
-bash: glxinfo: command not found

nvidia-smi however reveals my local driver version is 440.64.
On GCE on first boot the VM reported it was installing nVidia drivers of version 418.87.01.
However, ./nvidiaInstall.py 418.87.01 nixGLNvidia produces a 404-ing url.
Different versions are available tho seemingly not this exact one. surrounding available versions are 418.74 and 418.88, of which the latter went thru.

prepending the nix-shell command with nixGLNvidia then makes it work, e.g. nixGLNvidia nix-shell --arg cudaVersion 10.

KiaraGrouwstra · 2020-04-26T13:45:05Z

will this same strategy work for me locally?!

edit: yes!

KiaraGrouwstra · 2020-04-26T21:44:07Z

progress switching my code to support CUDA: cuda, device.

KiaraGrouwstra · 2020-04-27T22:33:19Z

run-time crap on device branch:

all LSTM calls using CUDA suddenly emit the following error:

Warning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (_cudnn_impl at ../../aten/src/ATen/native/cudnn/RNN.cpp:1266)

that flatten_parameters() sounds like a mutating type of method that would definitely seem different in HaskTorch. hm.
the error seems mentioned at e.g. 1, 2, 3, 4.
one implied maybe there is no real underlying issue, but the next issue if related might imply otherwise.

after some R3NN call (scores printed), the program inevitably ends up crashing with this:

fish: “cabal v1-run synthesizer -- --e…” terminated by signal SIGSEGV (Address boundary error)

this didn't happen when everything was just on CPU.
the weird thing is, the first action after that printing statement should be printing the same thing again from NSPS train i.e. outside the R3NN function. so it's a mystery to me exactly what op could be going wrong there.

basically, GPU still seems buggy?

KiaraGrouwstra · 2020-07-21T16:55:28Z

just retried now. seems to be working! just 10% faster tho against 4x the cost.
implementing parallelizing over task functions should fix that but I don't properly understand this yet. gonna leave this out of scope.

KiaraGrouwstra added the synthesis features needed for program synthesis label Feb 22, 2020

KiaraGrouwstra added a commit that referenced this issue Apr 18, 2020

add GPU test, now errors due to my Cuda version (#31)

f8c54ed

junjihashimoto mentioned this issue Jul 25, 2020

segfault when asValue is used on CUDA tensor hasktorch/hasktorch#430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support #31

GPU support #31

KiaraGrouwstra commented Feb 20, 2020

KiaraGrouwstra commented Apr 17, 2020

KiaraGrouwstra commented Apr 17, 2020

KiaraGrouwstra commented Apr 17, 2020

KiaraGrouwstra commented Apr 18, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 18, 2020

KiaraGrouwstra commented Apr 18, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020

KiaraGrouwstra commented Apr 27, 2020

KiaraGrouwstra commented Jul 21, 2020

GPU support #31

GPU support #31

Comments

KiaraGrouwstra commented Feb 20, 2020

KiaraGrouwstra commented Apr 17, 2020

KiaraGrouwstra commented Apr 17, 2020

KiaraGrouwstra commented Apr 17, 2020

KiaraGrouwstra commented Apr 18, 2020 • edited Loading

KiaraGrouwstra commented Apr 18, 2020

KiaraGrouwstra commented Apr 18, 2020 • edited Loading

KiaraGrouwstra commented Apr 26, 2020 • edited Loading

KiaraGrouwstra commented Apr 26, 2020 • edited Loading

KiaraGrouwstra commented Apr 26, 2020 • edited Loading

KiaraGrouwstra commented Apr 26, 2020

KiaraGrouwstra commented Apr 27, 2020

KiaraGrouwstra commented Jul 21, 2020

KiaraGrouwstra commented Apr 18, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 18, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020 •

edited

Loading

KiaraGrouwstra commented Apr 26, 2020 •

edited

Loading