Skip to content

Conversation

@darjisagar7
Copy link

Description

Adding MergeStats support for Composite Engine

curl --location --request GET 'localhost:9200/index-7/_stats/merge' | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1302  100  1302    0     0     88      0  0:00:14  0:00:14 --:--:--   289
{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "merges": {
        "current": 0,
        "current_docs": 0,
        "current_size_in_bytes": 0,
        "total": 1,
        "total_time_in_millis": 21,
        "total_docs": 66,
        "total_size_in_bytes": 11,
        "total_stopped_time_in_millis": 0,
        "total_throttled_time_in_millis": 0,
        "total_auto_throttle_in_bytes": 0,
        "unreferenced_file_cleanups_performed": 0
      }
    },
    "total": {
      "merges": {
        "current": 0,
        "current_docs": 0,
        "current_size_in_bytes": 0,
        "total": 1,
        "total_time_in_millis": 21,
        "total_docs": 66,
        "total_size_in_bytes": 11,
        "total_stopped_time_in_millis": 0,
        "total_throttled_time_in_millis": 0,
        "total_auto_throttle_in_bytes": 0,
        "unreferenced_file_cleanups_performed": 0
      }
    }
  },
  "indices": {
    "index-7": {
      "uuid": "AcXSnbxCSdWWNLK3AwFQ_A",
      "primaries": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 1,
          "total_time_in_millis": 21,
          "total_docs": 66,
          "total_size_in_bytes": 11,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0,
          "unreferenced_file_cleanups_performed": 0
        }
      },
      "total": {
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 1,
          "total_time_in_millis": 21,
          "total_docs": 66,
          "total_size_in_bytes": 11,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 0,
          "unreferenced_file_cleanups_performed": 0
        }
      }
    }
  }
}

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

❌ Gradle check result for a73ce78: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


public synchronized double getIORateLimitMBPerSec() {
// TODO: return the value from the Rust if the doAutoIOThrottle is enabled else INFINITY
return Double.POSITIVE_INFINITY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return -1 if it is not supported?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Lucene this is the number populated, so using the same.

Comment on lines 27 to 39
public long getTotalSizeInBytes() {
long totalSize = 0;
for (CatalogSnapshot.Segment segment : segmentsToMerge) {
for (WriterFileSet writerFileSet : segment.getDFGroupedSearchableFiles().values()) {
totalSize += writerFileSet.getTotalSize();
}
}
return totalSize;
}

public long getTotalNumDocs() {
long totalDocs = 0;
for (CatalogSnapshot.Segment segment : segmentsToMerge) {
for (WriterFileSet writerFileSet : segment.getDFGroupedSearchableFiles().values()) {
totalDocs += writerFileSet.getNumRows();
}
}
return totalDocs;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calculating it everytime, we can calculate this during oneMerge creation and store.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +81 to +83
public long getNumRows() {
return numRows;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to use this num of rows in CompositeMergePolicy as well? Can you update that logic as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 760 to 773
public DocsStats docStats() {
return null;
}

@Override
public SegmentsStats segmentsStats(boolean includeSegmentFileSizes, boolean includeUnloadedSegments) {
return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check index stats PR as well to see both stats are following same approach?

@github-actions
Copy link
Contributor

❌ Gradle check result for 1c26135: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

public void close() {
try {
RustBridge.closeWriter(filePath);
metadata = RustBridge.closeWriter(filePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this metadata being used? getMetadata should not be called after the writer is closed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in the flush function in VSRManager class

try (ArrowExport export = currentVSR.exportToArrow()) {
writer.write(export.getArrayAddress(), export.getSchemaAddress());
writer.close();
writerClosed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writer is not closed at this point right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is closed.

.writerGeneration(writerGeneration)
.addFile(file.getFileName().toString())
.addFile(filePath.getFileName().toString())
.addNumRows(parquetFileMetadata.numRows())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add the entire metadata? We can reuse it later to store checksums as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParquetFileMetadata is created in plugin and this code is in core.

Comment on lines +336 to +338
// No writer was found, but this is not necessarily an error
// Return null to indicate no metadata available
JObject::null().into_raw()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally if close should never be called for a closed writer. If some flow is doing that today, its a problem. Also, not that metadata is being returned, we should throw an error in this flow. Can take it as a follow up too. I don't want any incorrect conversion happening due to empty/null metadata.


public MergeStats mergeStats() {
final StatsHolder engine = getStatsHolderOrNull();
final StatsHolder engine = (StatsHolder) getCompositeEngineOrNull();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be modified. take the latest change and use getStatsHolderOrNull as it was before.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure done

* closed.
*/
protected CompositeEngine getCompositeEngineOrNull() {
return this.currentCompositeEngineReference.get();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


@Override
public MergeStats getMergeStats() {
return mergeScheduler.stats();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need details for on going merges in the segment stats api.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

❌ Gradle check result for f18a032: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

❌ Gradle check result for e807572: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for f713e09: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

if (!writerClosed.get()) {
try {
metadata = RustBridge.closeWriter(filePath);
writerClosed.set(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use compareAndSet here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should be done before the native call, and only if compareAndSet returns true.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

❌ Gradle check result for 51343dc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bharath-techie bharath-techie merged commit befa361 into opensearch-project:feature/datafusion Jan 7, 2026
7 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants