I am running GenCast with ecmwf-lab/ai-models-gencast. I am running with an NVIDIA A100 80GB GPU and 256 GB AMD EPYC 7763 CPU.
I got the Gencast assets and with download-assets. I downloaded the environment using the recommended way to download the requirements for GPU: pip install -r requirements-gpu.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
My run command is
ai-models --input cds --date 20230101 --time 0000 --assets /PATH/TO/ASSETS/ gencast --num-ensemble-members 1 --lead-time 12
This should just make a 1 step prediction with 1 ensemble member. However, I get a segmentation fault:
2025-07-04 13:10:09,129 INFO Building model: 0.5 second.
2025-07-04 13:10:10,182 INFO Converting GRIB to xarray: 1 second.
2025-07-04 13:10:10,322 INFO Reindexing: 0.1 second.
2025-07-04 13:10:10,324 INFO Creating input data: 1 second.
2025-07-04 13:10:10,339 INFO Replacing constants: 15 milliseconds.
2025-07-04 13:10:10,590 INFO Extracting input targets: 0.2 second.
2025-07-04 13:10:10,590 INFO Creating input data (total): 1 second.
2025-07-04 13:10:10,898 INFO Model initialisation: 7 seconds
2025-07-04 13:10:10,898 INFO Starting inference for 1 steps (12h).
2025-07-04 13:10:10,941 INFO Samples slice(0, 1, None) out of 1
/global/cfs/cdirs/m4416/ai-models/lib/python3.10/site-packages/scipy/sparse/_index.py:210: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil and dok are more efficient.
self._set_arrayXarray(i, j, x)
/global/cfs/cdirs/m4416/ai-models/lib/python3.10/site-packages/scipy/sparse/_index.py:210: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil and dok are more efficient.
self._set_arrayXarray(i, j, x)
2025-07-04 13:14:25,707 INFO mask_block_size: 10273.
Segmentation fault
I run into the exact same issue when I use gencast-1.0 also.
ai-models --input cds --date 20230101 --time 0000 --assets /PATH/TO/ASSETS/ gencast-1.0 --num-ensemble-members 1 --lead-time 12
Could you provide some insight on how to debug this error? Thanks very much for making this package available.
I am running GenCast with ecmwf-lab/ai-models-gencast. I am running with an NVIDIA A100 80GB GPU and 256 GB AMD EPYC 7763 CPU.
I got the Gencast assets and with
download-assets. I downloaded the environment using the recommended way to download the requirements for GPU:pip install -r requirements-gpu.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.htmlMy run command is
This should just make a 1 step prediction with 1 ensemble member. However, I get a segmentation fault:
I run into the exact same issue when I use gencast-1.0 also.
Could you provide some insight on how to debug this error? Thanks very much for making this package available.