Skip to content

Conversation

@erikvansebille
Copy link
Member

This PR builds on #2094, following @VeckoTheGecko's suggestion at Parcels-code/parcels-benchmarks#1 (comment) to keep the ParticleData in both an xarray.DataSet structure (ParticleSet._ds) and a dict-of-numpy-arrays (ParticleSet._data).

This new branch has the same performance as #2094 (see Parcels-code/parcels-benchmarks#1 (comment)), but advantage is that users can also access the data as a xarray.DataSet.

This will be useful e.g. in the __repr__ (to be implemented) and the new implementation of ParticleFile

  • Chose the correct base branch (particledata_as_dict)
  • Added tests

@VeckoTheGecko
Copy link
Contributor

I think a difficulty with this is that once a new array is created (e..g, from the deletion of particles), then there will be drift between these two data structures. I don't think this is currently working as intended.

Here's a test:

def test_pset_remove_indices(fieldset):
    npart = 10
    lon_start = np.linspace(0, 1, npart)
    lat_start = np.linspace(1, 0, npart)
    pset = ParticleSet(fieldset, lon=np.linspace(0, 1, npart), lat=np.linspace(1, 0, npart))
    assert len(pset._ds.lon) == len(pset._data["lon"]) == npart

    pset.remove_indices([0])
    assert len(pset._ds.lon) == len(pset._data["lon"]) == npart - 1
________________________________________________________________ test_pset_remove_indices _________________________________________________________________

fieldset = <parcels.fieldset.FieldSet object at 0x3249d7f20>

    def test_pset_remove_indices(fieldset):
        npart = 10
        lon_start = np.linspace(0, 1, npart)
        lat_start = np.linspace(1, 0, npart)
        pset = ParticleSet(fieldset, lon=np.linspace(0, 1, npart), lat=np.linspace(1, 0, npart))
        assert len(pset._ds.lon) == len(pset._data["lon"]) == npart
    
        pset.remove_indices([0])
>       assert len(pset._ds.lon) == len(pset._data["lon"]) == npart - 1
E       AssertionError: assert 10 == 9
E        +  where 10 = len(<xarray.DataArray 'lon' (trajectory: 10)> Size: 40B\narray([0.        , 0.11111111, 0.22222222, 0.33333334, 0.44444445,..., 0.8888889 , 1.        ],\n      dtype=float32)\nCoordinates:\n  * trajectory  (trajectory) int64 80B 0 1 2 3 4 5 6 7 8 9)
E        +    where <xarray.DataArray 'lon' (trajectory: 10)> Size: 40B\narray([0.        , 0.11111111, 0.22222222, 0.33333334, 0.44444445,..., 0.8888889 , 1.        ],\n      dtype=float32)\nCoordinates:\n  * trajectory  (trajectory) int64 80B 0 1 2 3 4 5 6 7 8 9 = <xarray.Dataset> Size: 640B\nDimensions:         (trajectory: 10, ngrid: 1)\nCoordinates:\n  * trajectory      (trajector... datetime64[ns] 80B NaT NaT NaT ... NaT NaT NaT\nAttributes:\n    ngrid:    1\n    ptype:    ParticleType(pclass=Particle).lon
E        +      where <xarray.Dataset> Size: 640B\nDimensions:         (trajectory: 10, ngrid: 1)\nCoordinates:\n  * trajectory      (trajector... datetime64[ns] 80B NaT NaT NaT ... NaT NaT NaT\nAttributes:\n    ngrid:    1\n    ptype:    ParticleType(pclass=Particle) = <[KeyError('pclass') raised in repr()] ParticleSet object at 0x3249d4080>._ds
E        +  and   9 = len(array([0.11111111, 0.22222222, 0.33333334, 0.44444445, 0.5555556 ,\n       0.6666667 , 0.7777778 , 0.8888889 , 1.        ], dtype=float32))

tests/v4/test_particleset_execute.py:142: AssertionError

@erikvansebille
Copy link
Member Author

Ah, good catch! I hadn't realised that deleting a particle would create a new dataset, indeed. And so would adding two ParticleSets.

We could change the code to all update the dict-of-numpy-arrays whenever we change the dataset; or is that too prone to errors?

Adding @VeckoTheGecko's failing test showing that the dict-of-numpys does not track the xarray dataset anymore after a deletion
@erikvansebille
Copy link
Member Author

I just added the test above to the unit tests suite, so that we got failing CI and don't accidentally merge this PR

@erikvansebille
Copy link
Member Author

Clsogn this PR, as the combination of a dict-of-numpy-arrays and an xarray Dataset does not seem robust enough (see #2097 (comment))

@github-project-automation github-project-automation bot moved this from Backlog to Done in Parcels development Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants