[Question] Efficiently selecting nearest time data per group in Xarray #10233
-
Subject: Efficiently selecting nearest time data per group (sv) in Xarray Hi Xarray community, I'm working with GNSS data where I need to calculate satellite positions based on ephemeris data. I have two main Xarray Datasets:
Goal: For each
Current (Slow) Approach: I'm currently using nested loops, which is inefficient for my dataset size (potentially thousands of time steps and multiple satellites): # ranges has coordinates sv, time
# nav_data has coordinates sv, time
# result is pre-allocated with coordinates matching ranges
for satellite in ranges.sv.values:
# Pre-filter nav_data for the current satellite
nav_data_sat = nav_data.sel(sv=satellite).dropna(dim='time', how='all')
# Iterate through the time coordinates relevant for the calculation
for dt in ranges.time.values:
# Find the single ephemeris entry for 'satellite' closest in time to 'dt'
# This assumes we need a result for every combination, adjust if ranges is sparse
ephemeris = nav_data_sat.sel(time=dt, method='nearest')
# Perform calculation using the selected ephemeris and dt
# x, y, z, ... = satellite_position_velocity_clock_correction(ephemeris, dt)
# Store results for this specific (dt, satellite) pair
# result['x'].loc[dt, satellite] = x
# ... etc ... entire module here Challenge & Attempts: I need a vectorized Xarray solution to replace these loops. I've tried:
Question: What is the idiomatic Xarray way to efficiently perform this grouped nearest-neighbor lookup? Specifically, how can I select data from Thanks for any guidance or suggestions! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Interesting problem! Can you write a minimal example with synthetic data that we could test out please? |
Beta Was this translation helpful? Give feedback.
-
Here you go. import numpy as np
import xarray as xr
def f(ephemeris, x):
a = ephemeris.a.item()
b = ephemeris.b.item()
return a * x + b
# Function to compute values based on nearest time
def compute_nearest_time_values(ephemeris, observations, x):
"""
Computes f(x) = a * x + b for each (id, time) pair in observations,
using the nearest (id, time) pair from data for coefficients a and b.
"""
result = xr.Dataset(
{
"value": (("id", "time"), np.empty((len(observations.id), len(observations.time))))
},
coords={"id": observations.id, "time": observations.time},
)
for identifier in observations.id:
observations_id = ephemeris.sel(id=identifier).dropna(dim="time", how="all")
for time in observations.time:
# Find nearest entry in data
nearest = observations_id.sel(time=time, method="nearest")
# Compute the value
value = f(nearest, x)
# Store the result
result["value"].loc[identifier, time] = value
return result
# Dummy ids
i0, i1 = "id0", "id1"
# Dummy ephemeris
e0, e1, e2, e3 = "2025-01-01T00:00", "2025-01-01T06:00", "2025-01-01T12:00", "2025-01-01T18:00"
ephemeris = xr.Dataset(
{
"a": (("id", "time"), [[1, np.nan, 3, 4], [10, 20, np.nan, 40]]),
"b": (("id", "time"), [[5, np.nan, 3, 2], [50, 40, np.nan, 20]]),
},
coords={"id": [i0, i1], "time": np.array([e0, e1, e2, e3], dtype="datetime64")},
)
# Dummy observation
o0, o1, o2 = "2025-01-01T02:30", "2025-01-01T06:15", "2025-01-01T12:15"
observations = xr.Dataset(
coords={"id": [i0, i1], "time": np.array([o0, o1, o2], dtype="datetime64")},
)
x = 10
result = compute_nearest_time_values(ephemeris, observations, x)
assert float(result.sel(id=i0, time=o0).value) == 1 * x + 5
assert float(result.sel(id=i0, time=o1).value) == 3 * x + 3
assert float(result.sel(id=i0, time=o2).value) == 3 * x + 3 # 06:00 does not exist, 12:00 is nearest
assert float(result.sel(id=i1, time=o0).value) == 10 * x + 50
assert float(result.sel(id=i1, time=o1).value) == 20 * x + 40
assert float(result.sel(id=i1, time=o2).value) == 40 * x + 20 # 12:00 does not exist, 18:00 is nearest |
Beta Was this translation helpful? Give feedback.
-
One has to do the following it is reasonably faster but not perfect.
|
Beta Was this translation helpful? Give feedback.
-
I think the main issue is xarray does not supports multi select with different methods. One would need |
Beta Was this translation helpful? Give feedback.
One has to do the following it is reasonably faster but not perfect.
UniqueGrouper
andBinGrouper
squeeze
the id coordinatesel
ect using pad or nearestmax
ormin
to get a dense matrix.