-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: invalid device ordinal #3
Comments
Trying this with model = galai.load_model("base") it looks like there is a device map that expects 8 GPUs, if I'm seeing this right: {'decoder.embed_tokens': 0,
'decoder.embed_positions': 0,
'decoder.layer_norm': 0,
'decoder.layers.0': 0,
'decoder.layers.1': 0,
'decoder.layers.2': 0,
'decoder.layers.3': 1,
'decoder.layers.4': 1,
'decoder.layers.5': 1,
'decoder.layers.6': 2,
'decoder.layers.7': 2,
'decoder.layers.8': 2,
'decoder.layers.9': 3,
'decoder.layers.10': 3,
'decoder.layers.11': 3,
'decoder.layers.12': 4,
'decoder.layers.13': 4,
'decoder.layers.14': 4,
'decoder.layers.15': 5,
'decoder.layers.16': 5,
'decoder.layers.17': 5,
'decoder.layers.18': 6,
'decoder.layers.19': 6,
'decoder.layers.20': 6,
'decoder.layers.21': 7,
'decoder.layers.22': 7,
'decoder.layers.23': 7} |
If you have less than the default number of GPUs (8), you have to specify how many when you load the model. Try: |
Thanks @dcruiz01 that worked out like a charm. |
Confirmed. Had same error and num_gpus = 1 resolved it. |
Please mention that in your documentation / readme. |
A model size between base and standard would be nice. I barely can't fit standard on my RTX 3090, I think. |
Do you offer 8 bit versions/compatibility, like BLOOM? |
I see, dtype='float16' does the job sorry. Please mention in readme. Many folks will want to try on a local gpu as well. |
Hmm.. 8 bit would still be handy to play with larger models. Is that possible? |
Num of GPUs defaults to None. |
Who has a default number of 8 GPUs? |
people that work at Meta AI, probably XD |
why this isnt written on main page |
galai 1.1.0 uses all available GPUs by default which should fix the issue. One can still manually specify the number of GPUs using |
When I load model I have this error.
Traceback (most recent call last):
File "", line 1, in
File "test/env/lib/python3.9/site-packages/galai/init.py", line 39, in load_model
model._load_checkpoint(checkpoint_path=get_checkpoint_path(name))
File "test/env/lib/python3.9/site-packages/galai/model.py", line 63, in _load_checkpoint
load_checkpoint_and_dispatch(
File "test/env/lib/python3.9/site-packages/accelerate/big_modeling.py", line 366, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "test/env/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 701, in load_checkpoint_in_model
set_module_tensor_to_device(model, param_name, param_device, value=param)
File "test/env/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 124, in set_module_tensor_to_device
new_value = value.to(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The text was updated successfully, but these errors were encountered: