You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model is (mostly) being loaded to the last GPU. However, I'd expect it to be loaded across the different GPUs. Moreover, infer_auto_device_map seems to be not working.
I have experienced this very similar issue with different hardware.
The text was updated successfully, but these errors were encountered:
I think I've isolated part of the issue. When I don't allow one GPU then the model is split across GPUS: export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6
I don't know if the issue is version-specific or happens for settings with >7 GPUs. Interestingly enough, 8 GPUs worked fine for Mistral-7B.
System Info
Hardware: Amazon Linux EC2 Instance.
8 NVIDIA A10G (23 GB)
Reproduction
However, if I load without the quantization_config, no issue at all:
Expected behavior
The model is (mostly) being loaded to the last GPU. However, I'd expect it to be loaded across the different GPUs. Moreover, infer_auto_device_map seems to be not working.
I have experienced this very similar issue with different hardware.
The text was updated successfully, but these errors were encountered: