- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 365
optimize shard writing #3561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
optimize shard writing #3561
Conversation
Writing to sharded arrays was up to 10x slower for largish chunk sizes because the _ShardBuilder object has many calls to np.concatenate. This commit coalesces these into a single concatenate call, and improves write performance by a factor of 10 on the benchmarking script in zarr-developers#3560. Added a new core.Buffer.combine API Resolves zarr-developers#3560 Signed-off-by: Noah D. Brenowitz <[email protected]>
| Benchmarking results After  | 
Signed-off-by: Noah D. Brenowitz <[email protected]>
Signed-off-by: Noah D. Brenowitz <[email protected]>
| thanks so much to tackling this problem! | 
| hmmm. lot's of failing tests. I'm having some trouble grokking the implementation of the sharding codec. | 
| I'll have a look. I don't know this section of the codebase very well, but maybe I can figure something out. And BTW if you see any opportunities to refactor / simplify the logic here, feel free to move in that direction. I think making this code easier to understand would be a big win. | 
| diff --git a/src/zarr/codecs/sharding.py b/src/zarr/codecs/sharding.py
index 0e36f50f..31d68027 100644
--- a/src/zarr/codecs/sharding.py
+++ b/src/zarr/codecs/sharding.py
@@ -227,6 +227,7 @@ class _ShardBuilder(_ShardReader, ShardMutableMapping):
     buffers: list[Buffer]
     index: _ShardIndex
     buf: Buffer
+    _chunk_buffers: dict[tuple[int, ...], Buffer]  # Map chunk_coords to buffers
 
     @classmethod
     def merge_with_morton_order(
@@ -255,13 +256,24 @@ class _ShardBuilder(_ShardReader, ShardMutableMapping):
         obj = cls()
         obj.buf = buffer_prototype.buffer.create_zero_length()
         obj.buffers = []
+        obj._chunk_buffers = {}
         obj.index = _ShardIndex.create_empty(chunks_per_shard)
         return obj
 
+    def __getitem__(self, chunk_coords: tuple[int, ...]) -> Buffer:
+        # Override to use _chunk_buffers instead of self.buf
+        if chunk_coords in self._chunk_buffers:
+            return self._chunk_buffers[chunk_coords]
+        raise KeyError
+
     def __setitem__(self, chunk_coords: tuple[int, ...], value: Buffer) -> None:
         chunk_start = sum(len(buf) for buf in self.buffers)
         chunk_length = len(value)
-        self.buffers.append(value)
+        # Store the buffer for later retrieval
+        self._chunk_buffers[chunk_coords] = value
+        # Only append non-empty buffers to avoid messing up offset calculations
+        if chunk_length > 0:
+            self.buffers.append(value)
         self.index.set_chunk_slice(chunk_coords, slice(chunk_start, chunk_start + chunk_length))
 
     def __delitem__(self, chunk_coords: tuple[int, ...]) -> None:
@@ -280,7 +292,8 @@ class _ShardBuilder(_ShardReader, ShardMutableMapping):
             self.buffers.insert(0, index_bytes)
         else:
             self.buffers.append(index_bytes)
-        self.buf = self.buf.combine(self.buffers)makes this work for me locally | 
| @nbren12 please take a look at https://github.com/d-v-b/zarr-python/blob/chore/sharding-refactor/src/zarr/codecs/sharding.py, it's a refactor of the sharding logic that might be easier to reason about. | 
remove inheritance, hide the index attribute and remove some indirection Signed-off-by: Noah D. Brenowitz <[email protected]>
just use dicts Signed-off-by: Noah D. Brenowitz <[email protected]>
| Thanks @d-v-b. I ended up pursuing an alternative refactor. I removed the various shard builder objects and things seem to be working now. | 
Signed-off-by: Noah D. Brenowitz <[email protected]>
| @d-v-b Hopefully my latest commit fixes the tests. For a future PR, the partial write case could definitely use further optimization. When the index is stored at the end it should be possible to write just the new chunks...rather than the whole shard. | 
| Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #3561      +/-   ##
==========================================
+ Coverage   61.77%   61.87%   +0.10%     
==========================================
  Files          85       85              
  Lines       10165    10134      -31     
==========================================
- Hits         6279     6270       -9     
+ Misses       3886     3864      -22     
 🚀 New features to boost your workflow:
 | 
Writing to sharded arrays was up to 10x slower for largish chunk sizes because the _ShardBuilder object has many calls to np.concatenate. This commit coalesces these into a single concatenate call, and improves write performance by a factor of 10 on the benchmarking script in #3560.
Added a new core.Buffer.combine API
Resolves #3560
[Description of PR]
TODO:
docs/user-guide/*.mdchanges/