-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Remote store recovery support for Optimized and Lucene Indices #20365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/datafusion
Are you sure you want to change the base?
Remote store recovery support for Optimized and Lucene Indices #20365
Conversation
Signed-off-by: Kamal Nayan <[email protected]>
Signed-off-by: Raghuvansh Raj <[email protected]>
… needs more work on this
…to feature/rep_rec
|
❌ Gradle check result for d288377: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for ba64eea: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4ba76b2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 0d2ced5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 77ed2a2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…rom directory and doc counts as well
|
❌ Gradle check result for a9f5058: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…check Signed-off-by: Raghuvansh Raj <[email protected]>
|
❌ Gradle check result for 9de9311: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Raghuvansh Raj <[email protected]>
|
❌ Gradle check result for da95ed4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
This PR implements the remote store recovery flow for CompositeEngine/DataFusion, enabling indices using non-Lucene data formats (Parquet, Arrow, etc.) to properly recover from remote store after node failures, restarts, and replica promotions.
Key Features
1. Format-Aware Recovery from Remote Store
FileMetadataformat information (e.g., "parquet", "arrow") when syncing segments from remote storesyncSegmentsFromRemoteSegmentStore()uses format-awareFileMetadatakeys instead of string-based keys2. CompositeEngine Empty Store Handling
FileNotFoundExceptionduring recovery when local store is empty (common in remote store recovery scenarios)LocalCheckpointTrackereven when no prior commits exist3. Engine Reset for Recovery
resetEngineToGlobalCheckpoint()to properly initialize CompositeEngine with fresh translog BEFORE creating InternalEngineCatalogSnapshot.userDatabefore serialization for recovery consistency4. Checkpoint Tracking
LastRefreshedCheckpointListenerto CompositeEngine for tracking refresh checkpoints (required by RemoteStoreRefreshListener)lastRefreshedCheckpoint(),currentOngoingRefreshCheckpoint()_5. Lucene Index Recovery support:
Updated the SyncSegmentsFromRemoteStore API and the recovery flow to handle optimized and non-optimized indices properly during recovery.
Test Coverage
Added comprehensive integration test suite
DataFusionRemoteStoreRecoveryTestscovering:AlreadySetExceptionerror while creatingSearchContext.Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.