Skip to content

Implement exponential backoff and do not update MCD, MS on same error #1030

@elankath

Description

@elankath

How to categorize this issue?

/area control-plane
/kind bug
/priority 1

What happened:

We had a live, scalability issue where due to invalid credentials, the etcd database was filled up.
The machine-controller-manager was continuing updating MachineDeployments and MachineSets. The MachineDeployment status contains entry for each Machine and its lastError.

(issues-canary/issues/7190 internally)

What you expected to happen:

  • machine-controller-manager should adhere to controller best practices such as exponential backoff and skipping no-op (status) updates if there is no change in the status.

How to reproduce it (as minimally and precisely as possible):

  • Use the virtual mcm provider and local api-server and etcd to simulate credential failure for large number of machines (> 1000)
  • Check size of etcd db.

Anything else we need to know?:

Metadata

Metadata

Assignees

Labels

area/control-planeControl plane relatedeffort/2mEffort for issue is around 2 monthskind/bugBugpriority/2Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions