Skip to content
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions source/transactions-convenient-api/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,14 @@ If possible, drivers should implement these tests without requiring the test run
the retry timeout. This might be done by internally modifying the timeout value used by `withTransaction` with some
private API or using a mock timer.

### Retry Backoff is Enforced

Drivers should test that retries within `withTransaction` do not occur immediately. Ideally, set BACKOFF_INITIAL 500ms
and configure a failpoint that forces one retry. Ensure that the operation took more than 500ms so succeed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would not work as described because jitter is non-deterministic.

An alternative test would be to use a failpoint to fail the transaction X times and then assert the overall time is larger than some threshold.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I realized that when implementing it. I set the fail point to fail 3 times and that seems to be consistently working (and without jitter failing 3 times would still cause the success to happen within 500ms)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: it was consistent locally but super flakey on atlas. I modified the with transaction code to append backoff values to make this easier to test and the test is now more stable.


## Changelog

- 2025-10-17: Added Backoff test.
- 2024-09-06: Migrated from reStructuredText to Markdown.
- 2024-02-08: Converted legacy tests to unified format.
- 2021-04-29: Remove text about write concern timeouts from prose test.
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,8 @@ has not been exceeded, the driver MUST retry a transaction that fails with an er
"TransientTransactionError" label. Since retrying the entire transaction will entail invoking the callback again,
drivers MUST document that the callback may be invoked multiple times (i.e. one additional time per retry attempt) and
MUST document the risk of side effects from using a non-idempotent callback. If the retry timeout has been exceeded,
drivers MUST NOT retry the transaction and allow `withTransaction` to propagate the error to its caller.
drivers MUST NOT retry the transaction and allow `withTransaction` to propagate the error to its caller. When retrying,
drivers MUST implement an exponential backoff with jitter following the algorithm described below.

If an error bearing neither the UnknownTransactionCommitResult nor the TransientTransactionError label is encountered at
any point, the driver MUST NOT retry and MUST allow `withTransaction` to propagate the error to its caller.
Expand Down Expand Up @@ -128,11 +129,21 @@ This method should perform the following sequence of actions:
6. If the callback reported an error:
1. If the ClientSession is in the "starting transaction" or "transaction in progress" state, invoke
[abortTransaction](../transactions/transactions.md#aborttransaction) on the session.

2. If the callback's error includes a "TransientTransactionError" label and the elapsed time of `withTransaction` is
less than 120 seconds, jump back to step two.
less than 120 seconds, sleep for `jitter * min(BACKOFF_INITIAL * (1.25**retry), BACKOFF_MAX)` where:

1. jitter is a random float between \[0, 1)
2. retry is one less than the number of times Step 2 has been executed since Step 1 was executed
3. BACKOFF_INITIAL is 1ms
4. BACKOFF_MAX is 500ms

Then, jump back to step two.

3. If the callback's error includes a "UnknownTransactionCommitResult" label, the callback must have manually
committed a transaction, propagate the callback's error to the caller of `withTransaction` and return
immediately.

4. Otherwise, propagate the callback's error to the caller of `withTransaction` and return immediately.
7. If the ClientSession is in the "no transaction", "transaction aborted", or "transaction committed" state, assume the
callback intentionally aborted or committed the transaction and return immediately.
Expand All @@ -154,11 +165,18 @@ This method should perform the following sequence of actions:
This method can be expressed by the following pseudo-code:

```typescript
var BACKOFF_INITIAL = 1 // 1ms initial backoff
var BACKOFF_MAX = 500 // 500ms max backoff
withTransaction(callback, options) {
// Note: drivers SHOULD use a monotonic clock to determine elapsed time
var startTime = Date.now(); // milliseconds since Unix epoch
var retry = 0

retryTransaction: while (true) {
if (retry > 0):
sleep(Math.random() * min(BACKOFF_INITIAL * (1.25**retry),
BACKOFF_MAX))
retry += 1
this.startTransaction(options); // may throw on error

try {
Expand Down Expand Up @@ -324,8 +342,8 @@ exceed the user's original intention for `maxTimeMS`.
The callback may be executed any number of times. Drivers are free to encourage their users to design idempotent
callbacks.

A previous design had no limits for retrying commits or entire transactions. The callback is always able indicate that
`withTransaction` should return to its caller (without future retry attempts) by aborting the transaction directly;
A previous design had no limits for retrying commits or entire transactions. The callback is always able to indicate
that `withTransaction` should return to its caller (without future retry attempts) by aborting the transaction directly;
however, that puts the onus on avoiding very long (or infinite) retry loops on the application. We expect the most
common cause of retry loops will be due to TransientTransactionErrors caused by write conflicts, as those can occur
regularly in a healthy application, as opposed to UnknownTransactionCommitResult, which would typically be caused by an
Expand Down Expand Up @@ -357,6 +375,8 @@ provides an implementation of a technique already described in the MongoDB 4.0 d

## Changelog

- 2025-10-17: withTransaction applies exponential backoff when retrying.

- 2024-09-06: Migrated from reStructuredText to Markdown.

- 2023-11-22: Document error handling inside the callback.
Expand Down
Loading