Skip to content

Default enablement of Threshold-Based Availability Strategy with Per-Partition Automatic Failover. #45267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jeet1995
Copy link
Member

@jeet1995 jeet1995 commented May 9, 2025

Description

As title.

With default enablement of Threshold-Based Availability Strategy with Per-Partition Automatic Failover enablement, reads and queries will be hedged to second preferred region onwards.

The defaults to hedging are 1s to the second preferred region and then 500ms to reach out to the third preferred region and so on.

A customer can choose to opt out of this through the following system property

System.setProperty("COSMOS.IS_READ_AVAILABILITY_STRATEGY_ENABLED_WITH_PPAF", "false");

[or] set COSMOS_IS_READ_AVAILABILITY_STRATEGY_ENABLED_WITH_PPAF environment variable to false.

A customer can choose to override above defaults by configuring CosmosEndToEndLatencyPolicyConfig at request options level or CosmosClientBuilder. See below:

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@Copilot Copilot AI review requested due to automatic review settings May 9, 2025 01:08
@jeet1995 jeet1995 requested review from kirankumarkolli and a team as code owners May 9, 2025 01:08
@github-actions github-actions bot added the Cosmos label May 9, 2025
@jeet1995 jeet1995 changed the title Enable availability strategy with ppaf Default enablement of Threshold-Based Availability Strategy with Per-Partition Automatic Failover. May 9, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request enables a threshold‐based availability strategy for reads when Per-Partition Automatic Failover (PPAF) is enabled. Key changes include adding new duration constants and a helper method in Utils, updating RxDocumentClientImpl to propagate the end-to-end policy configuration, and introducing new configuration properties and test validations for the strategy.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Utils.java Added duration constants and a min utility function to support latency configuration.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java Updated multiple methods to pass an end-to-end policy configuration and added a new helper to apply the latency policy.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Configs.java Introduced new config properties and a method to check if the read availability strategy is enabled with PPAF.
sdk/cosmos/azure-cosmos/docs/TimeoutAndRetriesConfig.md Documented the new threshold strategy defaults for PPAF.
sdk/cosmos/azure-cosmos/CHANGELOG.md Updated the changelog to include the new availability strategy feature.
sdk/cosmos/azure-cosmos-tests/ Updated tests to validate the new strategy behavior under fault injection scenarios.
Comments suppressed due to low confidence (1)

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java:227

  • The delay has been increased from 10,000 milliseconds to 10,000 seconds and the repetition count changed from 1 to 10,000, which may lead to unintended long delays. Please verify that these values are intended.
.delay(Duration.ofSeconds(10000))

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@jeet1995
Copy link
Member Author

jeet1995 commented May 9, 2025

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks!

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeet1995
Copy link
Member Author

jeet1995 commented May 9, 2025

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants