Skip to content
Draft
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,15 @@ Endpoint. The pool has the following properties:
- **Rate-limited:** A Pool MUST limit the number of [Connections](#connection) being
[established](#establishing-a-connection-internal-implementation) concurrently via the **maxConnecting**
[pool option](#connection-pool-options).
- **Backoff-capable** A pool MUST be able to enter backoff mode. A pool will automatically enter backoff mode when a
connection checkout fails under conditions that indicate server overload. While the Pool is in backoff, it exhibits
the following behaviors:
- **maxConnecting** is set to 1.
- The Pool waits for the backoff duration before another connection attempt.
- A successful heartbeat does NOT change the state of the pool.
- A failed heartbeat clears the pool.
- A subsequent failed connection will increase the backoff attempt.
- A successful connection will return the Pool to ready state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a description of the exponential backoff + jitter for the backoff duration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define the backoff and jitter policy in one place and link to it? If so, should I add it in this PR and where?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's small and simple enough that it should be defined here alongside where it will be used.


```typescript
interface ConnectionPool {
Expand Down Expand Up @@ -314,12 +323,17 @@ interface ConnectionPool {
* - "ready": The healthy state of the pool. It can service checkOut requests and create
* connections in the background. The pool can be set to this state via the
* ready() method.
*
* - "backoff": The pool is in backoff state. MaxConnecting is set to 1 and the pool backoff period
* must be observed before attempting another connection. A subsequent failed connection
* attempt increases the backoff duration. The pool can be set to this state via the
* backoff() method.
*
* - "closed": The pool is destroyed. No more Connections may ever be checked out nor any
* created in the background. The pool can be set to this state via the close()
* method. The pool cannot transition to any other state after being closed.
*/
state: "paused" | "ready" | "closed";
state: "paused" | "ready" | "backoff" | "closed";

// Any of the following connection counts may be computed rather than
// actually stored on the pool.
Expand Down Expand Up @@ -360,6 +374,11 @@ interface ConnectionPool {
*/
clear(interruptInUseConnections: Optional<Boolean>): void;

/**
* Enter backoff mode or increase backoff amount if already in backoff mode. Mark the pool as "backoff".
*/
backoff(): void

/**
* Mark the pool as "ready", allowing checkOuts to resume and connections to be created in the background.
* A pool can only transition from "paused" to "ready". A "closed" pool
Expand Down Expand Up @@ -829,6 +848,34 @@ interface PoolClearedEvent {
interruptInUseConnections: Optional<Boolean>;
}

/**
* Emitted when a Connection Pool is in backoff
*/
interface PoolBackoffEvent {
/**
* The ServerAddress of the Endpoint the pool is attempting to connect to.
*/
address: string;

/**
* The backoff attempt number.
*
* The incrementing backoff attempt number. This is included because
* the backoff duration is non-deterministic due to jitter.
*/
attempt: int64;

/**
* The duration the pool will not allow new connection establishments.
*
* A driver MAY choose the type idiomatic to the driver.
* If the type chosen does not convey units, e.g., `int64`,
* then the driver MAY include units in the name, e.g., `durationMS`.
*/
duration: Duration;
}


/**
* Emitted when a Connection Pool is closed
*/
Expand Down Expand Up @@ -1074,6 +1121,21 @@ placeholders as appropriate:

> Connection pool for {{serverHost}}:{{serverPort}} cleared for serviceId {{serviceId}}

#### Pool Backoff Message

In addition to the common fields defined above, this message MUST contain the following key-value pairs:

| Key | Suggested Type | Value |
| ---------- | -------------- | ---------------------------- |
| message | String | "Connection pool in backoff" |
| attempt | Int | The backoff attempt number. |
| durationMS | Int | Int32/Int64/Double |

The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in
placeholders as appropriate:

> Connection pool for {{serverHost}}:{{serverPort}} in backoff. Attempt: {{attempt}}. Duration: {{durationMS}} ms

#### Pool Closed Message

In addition to the common fields defined above, this message MUST contain the following key-value pairs:
Expand Down Expand Up @@ -1375,6 +1437,8 @@ to close and remove from its pool a [Connection](#connection) which has unread e

## Changelog

- 2025-XX-YY: Introduce "backoff" state.

- 2025-01-22: Clarify durationMS in logs may be Int32/Int64/Double.

- 2024-11-27: Relaxed the WaitQueue fairness requirement.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Valid Unit Test Operations are the following:
- `interruptInUseConnections`: Determines whether "in use" connections should be also interrupted
- `pool.close()`: call `close` on Pool
- `pool.ready()`: call `ready` on Pool
- `pool.backoff()`: call `backoff` on Pool

## Integration Test Format

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1056,6 +1056,10 @@ def handleError(error):
if isNotWritablePrimary(error):
check failing server
elif isNetworkError(error) or (not error.completedHandshake and (isNetworkTimeout(error) or isAuthError(error))):
# Ignore network errors and network timeout errors during TLS handshake or "hello" messages.
# These will be handled by the pool backoff.
if error.occurredDuringHello or error.occurredDuringTLSHandshake:
continue
if type != LoadBalanced
# Mark the server Unknown
unknown = new ServerDescription(type=Unknown, error=error)
Expand Down Expand Up @@ -1169,8 +1173,9 @@ TopologyType is ReplicaSetWithPrimary: referring to the table above we see the s
[checkIfHasPrimary](#checkifhasprimary). The result is the TopologyType changes to ReplicaSetNoPrimary. See the test
scenario called "Network error writing to primary".

The client MUST close all idle sockets in its connection pool for the server: if one socket is bad, it is likely that
all are.
The clients MUST NOT clear the connection pool when a connection TLS handshake or MongoDB handshake fail with network
errors or timeouts. If the network error or timeout occurs during TCP connection establishment, DNS lookup, or during
the authentication step, then client MUST close all idle sockets in its connection pool for the server.

Clients MUST NOT request an immediate check of the server; since application sockets are used frequently, a network
error likely means the server has just become unavailable, so an immediate refresh is likely to get a network error,
Expand Down Expand Up @@ -2027,6 +2032,8 @@ oversaw the specification process.
- 2025-01-22: Add error messages when a new primary is elected or a primary with a stale electionId or setVersion is
discovered.

- 2025-XX-YY: Add support for pool backoff state.

______________________________________________________________________

[^1]: "localThresholdMS" was called "secondaryAcceptableLatencyMS" in the Read Preferences Spec, before it was superseded
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -482,7 +482,8 @@ When a monitor completes a successful check against a server, it MUST mark the c
"ready", and doing so MUST be synchronized with the update to the topology (e.g. by marking the pool as ready in
onServerDescriptionChanged). This is required to ensure a server does not get selected while its pool is still paused.
See the [Connection Pool](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#connection-pool)
definition in the CMAP specification for more details on marking the pool as "ready".
definition in the CMAP specification for more details on marking the pool as "ready". If the server is in "backoff"
state, the monitor MUST NOT mark the connection pool as "ready".

### Error handling

Expand Down Expand Up @@ -971,6 +972,8 @@ outdated or inaccurate.

## Changelog

- 2025-XX-YY: Add support for pool "backoff" state.

- 2024-05-02: Migrated from reStructuredText to Markdown.

- 2020-02-20: Extracted server monitoring from SDAM into this new spec.
Expand Down
Loading
Loading