Refine indexing pressure accounting in semantic bulk inference filter #129320

jimczi · 2025-06-12T09:40:10Z

In #125517, we estimated that inference results would double the document _source size since they are pooled by the bulk action.
This PR reduces the memory needed to perform the update by reusing the original source array when possible. This way we can only account for the extra inference fields and reduce the overall indexing pressure.

Additionally, this PR introduces a new counter in InferenceStats to track the number of rejections caused by indexing pressure from inference results.

Previously, we conservatively estimated that inference results would double the document _source size. This can led to unnecessary circuit breaker exceptions, even when the node had sufficient memory to handle the operation. This PR replaces the rough estimation with the actual size of the _source after the update. Since inference calls use a batch size of 1 MB, we rely on the real circuit breaker to ensure that results fit in memory before applying indexing pressure accounting. Additionally, this PR introduces a new counter in InferenceStats to track the number of rejections caused by indexing pressure from inference results.

elasticsearchmachine · 2025-06-12T09:40:35Z

Pinging @elastic/search-eng (Team:SearchOrg)

elasticsearchmachine · 2025-06-12T09:40:35Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

elasticsearchmachine · 2025-06-12T09:40:35Z

Hi @jimczi, I've created a changelog YAML for you.

…' into indexing_pressure_bulk_inference

Mikep86

Before merging this PR, I would like to examine the assumption that doubling document source size for indexing pressure purposes is incorrect. This was done during development of #125517 because the original source is pooled and the memory for it is released only after bulk request handling is complete. We accounted for this by adding indexing memory pressure for the additional copy of source generated to insert embeddings into. What new information do we have now that allows us to change this approach?

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java

Mikep86 · 2025-06-12T13:29:57Z

...ava/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java

Looks like there was an update to this test suite in #129140 that disabled these tests on the new semantic text format. We should probably fix that...

Oups, good catch thanks. I added the test back in c85718d

We should apply just this change to 9.1 and 8.19 so that we restore test coverage in those branches

...ava/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java

Mikep86

The approach looks good overall, very clever! How often do you figure the source will be array-backed? In other words, is this an optimization you expect that we can use most of the time in production?

Mikep86 · 2025-06-25T20:53:46Z

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java

+                    newSource.array(),
+                    newSource.arrayOffset(),


Can we assume that newSource will always be array-backed?

Mikep86 · 2025-06-25T20:59:17Z

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java

+        private void appendSourceAndInferenceMetadata(
+            XContentBuilder builder,
+            BytesReference source,
+            XContentType xContentType,
+            Map<String, Object> inferenceFieldsMap
+        ) throws IOException {
+            builder.startObject();
+
+            // append the original source
+            try (
+                XContentParser parser = XContentHelper.createParserNotCompressed(XContentParserConfiguration.EMPTY, source, xContentType)
+            ) {
+                // skip start object
+                parser.nextToken();
+                while (parser.nextToken() != XContentParser.Token.END_OBJECT) {
+                    builder.copyCurrentStructure(parser);
                }
            }
-            long modifiedSourceSize = indexRequest.source().ramBytesUsed();

-            // Add the indexing pressure from the source modifications.
+            // add the inference metadata field
+            builder.field(InferenceMetadataFieldsMapper.NAME);
+            try (XContentParser parser = XContentHelper.mapToXContentParser(XContentParserConfiguration.EMPTY, inferenceFieldsMap)) {
+                builder.copyCurrentStructure(parser);
+            }
+
+            builder.endObject();
+        }


Would the operation performed in this method temporarily generate a second copy of the source with _inference_fields added? I think it would, but asking to confirm my understanding.

Mikep86 · 2025-06-25T21:00:32Z

...ava/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterTests.java

We should apply just this change to 9.1 and 8.19 so that we restore test coverage in those branches

Mikep86 · 2025-06-25T21:06:23Z

...ain/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilter.java

+            // Apply the updated source to the index request.
+            if (originalSource.hasArray()) {
+                // If the original source is backed by an array, perform in-place update:
+                // - Copy as much of the new source as fits into the original array.
+                System.arraycopy(
+                    newSource.array(),
+                    newSource.arrayOffset(),
+                    originalSource.array(),
+                    originalSource.arrayOffset(),
+                    originalSource.length()
+                );
+
+                int remainingSize = newSource.length() - originalSource.length();
+                if (remainingSize > 0) {
+                    // If there are additional bytes, append them as a new BytesArray segment.
+                    byte[] remainingBytes = new byte[remainingSize];
+                    System.arraycopy(
+                        newSource.array(),
+                        newSource.arrayOffset() + originalSource.length(),
+                        remainingBytes,
+                        0,
+                        remainingSize
+                    );
+                    indexRequest.source(
+                        CompositeBytesReference.of(originalSource, new BytesArray(remainingBytes)),
+                        indexRequest.getContentType()
+                    );
+                } else {
+                    // No additional bytes; just adjust the slice length.
+                    indexRequest.source(originalSource.slice(0, newSource.length()));
+                }
            } else {
-                try (XContentBuilder builder = XContentBuilder.builder(indexRequest.getContentType().xContent())) {
-                    appendSourceAndInferenceMetadata(builder, indexRequest.source(), indexRequest.getContentType(), inferenceFieldsMap);
-                    indexRequest.source(builder);
+                // If the original source is not array-backed, replace it entirely.
+                indexRequest.source(newSource, indexRequest.getContentType());
+            }


We don't have to do this in this PR, but it would be good to put this logic in a common place (IndexRequest? BytesReference?) so that we can leverage it in other places as well. I was thinking of adding methods like canUpdateInPlace and updateInPlace.

jimczi requested a review from Mikep86 June 12, 2025 09:40

jimczi added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0 v9.1.0 labels Jun 12, 2025

elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Relevance The Search organization Search Relevance team labels Jun 12, 2025

Update docs/changelog/129320.yaml

4c56277

github-actions bot deployed to docs-preview June 12, 2025 09:41 View deployment

jimczi added 2 commits June 12, 2025 11:06

fix uts

aeda08b

Merge remote-tracking branch 'origin/indexing_pressure_bulk_inference…

a12c9f1

…' into indexing_pressure_bulk_inference

github-actions bot deployed to docs-preview June 12, 2025 10:07 View deployment

Mikep86 reviewed Jun 12, 2025

View reviewed changes

Address review comments and copy the source in place when possible

c85718d

github-actions bot deployed to docs-preview June 12, 2025 22:33 View deployment

Merge branch 'main' into indexing_pressure_bulk_inference

474c9a7

github-actions bot deployed to docs-preview June 12, 2025 22:37 View deployment

Merge branch 'main' into indexing_pressure_bulk_inference

6e62f09

github-actions bot deployed to docs-preview June 13, 2025 08:10 View deployment

Mikep86 reviewed Jun 25, 2025

View reviewed changes

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refine indexing pressure accounting in semantic bulk inference filter #129320

Refine indexing pressure accounting in semantic bulk inference filter #129320

jimczi commented Jun 12, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

Mikep86 left a comment

Uh oh!

Uh oh!

Mikep86 Jun 12, 2025

Uh oh!

jimczi Jun 12, 2025

Uh oh!

Mikep86 Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

Mikep86 left a comment

Uh oh!

Mikep86 Jun 25, 2025

Uh oh!

Mikep86 Jun 25, 2025

Uh oh!

Mikep86 Jun 25, 2025

Uh oh!

Mikep86 Jun 25, 2025

Uh oh!

Uh oh!

Refine indexing pressure accounting in semantic bulk inference filter #129320

Are you sure you want to change the base?

Refine indexing pressure accounting in semantic bulk inference filter #129320

Conversation

jimczi commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mikep86 Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

jimczi Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Mikep86 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Mikep86 left a comment

Choose a reason for hiding this comment

Uh oh!

Mikep86 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Mikep86 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Mikep86 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Mikep86 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jimczi commented Jun 12, 2025 •

edited

Loading