Skip to content

Iterative Shard Writing ala TensorStore Transactions / TensorStore chunk_layout.write_chunk #3604

@srivarra

Description

@srivarra

Describe the new feature you'd like

Hi, I saw a comment comment on the ossci zulip by @mkitti where he raised the idea for creating a simplified version of TensorStore's transactions specifically with shards:

with shard.write_context() as shard_write:
    for chunk in shard_write.inner_chunks():
       shard_write[chunk] = calculation(chunk)

I've found TensorStore's transactions and iterating over their chunk_layout useful for memory bounded workloads. For example when downsampling datasets.

The following plot displays downsampling a fish of shape TCZYX (24, 2, 42, ~3.8K, ~13.1K) where we vary the "batch size" with ts.Transaction.

Here's some sample code I've used for this in TensorStore:

In this case, the batch would be the number of chunks we write at once in a single transaction.

downsampled = ts.downsample(
    source_ts, downsample_factors=downsample_factors, method=method
)

step = target_ts.chunk_layout.write_chunk.shape[0]

for start in range(0, downsampled.shape[0], step):
    with ts.Transaction() as txn:
        target_with_txn = target_ts.with_transaction(txn)
        downsampled_with_txn = downsampled.with_transaction(txn)
        stop = min(start + step, downsampled.shape[0])
        target_with_txn[start:stop].write(downsampled_with_txn[start:stop]).result()

As an aside I find this notation, and all of the result() calls, and indexing and ranging to be repulsive, but oh well. Maybe if there's a nice way to handle that under the hood within the context manager that'd be very cool.

Image

I see there is this "regular grid" for chunk grids, but I have no idea how to use this or if it's even iterable like ts.chunk_layout.write_chunk or related to this concept.

Let me know your thoughts, maybe there's stuff already in the zarr-python codebase where I can do something like this manually, tyty.

✨Le Fishe✨

Le Fishe

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions