Skip to content

Conversation

macvincent
Copy link
Contributor

Summary:
Another feature in the new chunking policy described in this doc is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a popChunk method in each StreamData class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 11, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81824143

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81824143

macvincent added a commit to macvincent/nimble that referenced this pull request Sep 11, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81824143

macvincent added a commit to macvincent/nimble that referenced this pull request Sep 11, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Sep 12, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

macvincent added a commit to macvincent/nimble that referenced this pull request Sep 12, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

macvincent added a commit to macvincent/nimble that referenced this pull request Sep 12, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

macvincent added a commit to macvincent/nimble that referenced this pull request Sep 29, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

macvincent added a commit to macvincent/nimble that referenced this pull request Sep 29, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Sep 30, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@facebook-github-bot
Copy link
Contributor

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating diff in D81824143.

macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:
Pull Request resolved: facebookincubator#248

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 9, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 13, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 13, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 13, 2025
Summary:
Pull Request resolved: facebookincubator#248

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 14, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
@macvincent macvincent force-pushed the export-D81824143 branch 2 times, most recently from d6d99f8 to 9a7c755 Compare October 14, 2025 07:13
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 14, 2025
Summary:
Pull Request resolved: facebookincubator#248

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
macvincent added a commit to macvincent/nimble that referenced this pull request Oct 14, 2025
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
Summary:

As preparation for our [Nimble chunked encoding](https://fburl.com/gdoc/zjck7lo6) work, we decided to clean up the previous contract to remove unused methods and attributes. Should be a no-op since these methods and attributes were not used. We also clarified the naming of some attributes.

Reviewed By: sdruzkin, helfman

Differential Revision: D81514657
…incubator#240)

Summary:
X-link: facebookexternal/presto-facebook#3412

X-link: facebookincubator/velox#14846


This is an implementation of the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1). It has two phases:

**Phase 1 - Memory Pressure Management (shouldChunk)**
The policy monitors total in-memory data size:
*  When memory usage exceeds the maximum threshold, initiates chunking to reduce memory footprint while continuing data ingestion
*  While memory remains above the minimum threshold, continues chunking to further reduce memory usage

**Phase 2 - Storage Size Optimization (shouldFlush)**
 Implements compression-aware stripe size prediction:
*   When chunking fails to reduce memory usage effectively and memory stays above the maximum threshold, forces a full stripe flush to guarantee memory relief
*   Calculates the anticipated final compressed stripe size by applying the estimated compression ratio to unencoded data
*   Triggers stripe flush when the predicted compressed size reaches the target stripe size threshold

`shouldChunk` is also now a separate method required by all flush policies. We updated all previous tests and code references.

Reviewed By: helfman

Differential Revision: D81516697
…ator#242)

Summary:

This should be a no-op since no chunking flush policy is currently being used in Prod. but we make three changes in this dif:
1. `writeChunk` now returns a boolean to indicate whether any stream was successfully chunked
2. The previous raw size of the encoded stripe data in the writer context is now stored in the Writer context
3. We update and pass down the memory stats needed by the new flush policy contract

TODO: We will be introducing two more VeloxWriter changes in the next diffs in this stack to:
1. Support per stream chunking instead of always chunking all eligible streams
2. Support breaking down large stream into multiple smaller chunks

Differential Revision: D81545433
…ator#243)

Summary:

This is an implementation of a detail in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1). Rather than chunking all eligible streams, we chunk individual streams in the order of their raw size until memory pressure is relieved. For our unit tests, the maximum number of chunks produced is identical to the previous implementation. But there may be differences for large file sizes. This requires more experimentation and tuning to determine the right threshold value that takes advantage of this.

Reviewed By: helfman

Differential Revision: D81715655
Summary:

Another feature in the new chunking policy described in this [doc](https://fburl.com/gdoc/gkdwwju1) is the ability to split large streams above a specified limit into smaller chunks. In this diff, we implement a `popChunk` method in each `StreamData` class to handle this functionality. With this feature we are not forced to encode extremely large streams into a single chunk.

Integration will happen in the next diff.

Differential Revision: D81824143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants