Skip to content

Segmentation Fault with GenCast #78

@ankurmahesh

Description

@ankurmahesh

I am running GenCast with ecmwf-lab/ai-models-gencast. I am running with an NVIDIA A100 80GB GPU and 256 GB AMD EPYC 7763 CPU.

I got the Gencast assets and with download-assets. I downloaded the environment using the recommended way to download the requirements for GPU: pip install -r requirements-gpu.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

My run command is

ai-models --input cds --date 20230101 --time 0000 --assets /PATH/TO/ASSETS/ gencast --num-ensemble-members 1 --lead-time 12

This should just make a 1 step prediction with 1 ensemble member. However, I get a segmentation fault:

2025-07-04 13:10:09,129 INFO Building model: 0.5 second.
2025-07-04 13:10:10,182 INFO Converting GRIB to xarray: 1 second.
2025-07-04 13:10:10,322 INFO Reindexing: 0.1 second.
2025-07-04 13:10:10,324 INFO Creating input data: 1 second.
2025-07-04 13:10:10,339 INFO Replacing constants: 15 milliseconds.
2025-07-04 13:10:10,590 INFO Extracting input targets: 0.2 second.
2025-07-04 13:10:10,590 INFO Creating input data (total): 1 second.
2025-07-04 13:10:10,898 INFO Model initialisation: 7 seconds
2025-07-04 13:10:10,898 INFO Starting inference for 1 steps (12h).
2025-07-04 13:10:10,941 INFO Samples slice(0, 1, None) out of 1
/global/cfs/cdirs/m4416/ai-models/lib/python3.10/site-packages/scipy/sparse/_index.py:210: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil and dok are more efficient.
  self._set_arrayXarray(i, j, x)
/global/cfs/cdirs/m4416/ai-models/lib/python3.10/site-packages/scipy/sparse/_index.py:210: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil and dok are more efficient.
  self._set_arrayXarray(i, j, x)
2025-07-04 13:14:25,707 INFO mask_block_size: 10273.
Segmentation fault

I run into the exact same issue when I use gencast-1.0 also.

ai-models --input cds --date 20230101 --time 0000 --assets /PATH/TO/ASSETS/ gencast-1.0 --num-ensemble-members 1 --lead-time 12

Could you provide some insight on how to debug this error? Thanks very much for making this package available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions