-
-
Notifications
You must be signed in to change notification settings - Fork 368
Description
Describe the new feature you'd like
Hi, I saw a comment comment on the ossci zulip by @mkitti where he raised the idea for creating a simplified version of TensorStore's transactions specifically with shards:
with shard.write_context() as shard_write:
for chunk in shard_write.inner_chunks():
shard_write[chunk] = calculation(chunk)I've found TensorStore's transactions and iterating over their chunk_layout useful for memory bounded workloads. For example when downsampling datasets.
The following plot displays downsampling a fish of shape TCZYX (24, 2, 42, ~3.8K, ~13.1K) where we vary the "batch size" with ts.Transaction.
Here's some sample code I've used for this in TensorStore:
In this case, the batch would be the number of chunks we write at once in a single transaction.
downsampled = ts.downsample(
source_ts, downsample_factors=downsample_factors, method=method
)
step = target_ts.chunk_layout.write_chunk.shape[0]
for start in range(0, downsampled.shape[0], step):
with ts.Transaction() as txn:
target_with_txn = target_ts.with_transaction(txn)
downsampled_with_txn = downsampled.with_transaction(txn)
stop = min(start + step, downsampled.shape[0])
target_with_txn[start:stop].write(downsampled_with_txn[start:stop]).result()As an aside I find this notation, and all of the result() calls, and indexing and ranging to be repulsive, but oh well. Maybe if there's a nice way to handle that under the hood within the context manager that'd be very cool.
I see there is this "regular grid" for chunk grids, but I have no idea how to use this or if it's even iterable like ts.chunk_layout.write_chunk or related to this concept.
Let me know your thoughts, maybe there's stuff already in the zarr-python codebase where I can do something like this manually, tyty.
