Skip to content

Conversation

@ronaldngounou
Copy link
Member

@ronaldngounou ronaldngounou commented Oct 11, 2025

As per performance improvements to etcd size limits have been evaluated to 100GB instead of 8GB.
https://www.cncf.io/blog/2019/05/09/performance-optimization-of-etcd-in-web-scale-data-scenario/

Contributes to issue #588

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ronaldngounou
Once this PR has been reviewed and has the lgtm label, please assign ivanvc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ronaldngounou ronaldngounou force-pushed the issue588-lift_etcd_GB_limit branch 2 times, most recently from ba37b27 to c93d626 Compare October 11, 2025 09:40
@ronaldngounou ronaldngounou force-pushed the issue588-lift_etcd_GB_limit branch from c93d626 to 49407a6 Compare October 11, 2025 18:36
@ronaldngounou
Copy link
Member Author

Lint issues fixed:

content/en/docs/v3.4/faq.md:29:291 MD059/descriptive-link-text 
Link text should be descriptive [Context: "[here]"] 
(https://github.com/DavidAnson/markdownlint/blob/main/doc/md059.md)

@jberkus
Copy link
Contributor

jberkus commented Oct 18, 2025

If you're doing this refactoring, I'd like to make it clear to users that the 100GB is a recommended maximum size, and not a hard limit. This would mean different text in a couple of places. I don't know what the actual hard limit is; probably need to look at the boltDB code.

@ronaldngounou
Copy link
Member Author

Could you please suggest a wording that we should have in the meatime?

also means more memory usage**. Just I mentioned in the beginning of this post, the suggested max value is 8GB. Of course,
If your VM has big memory (e.g. 64GB), it's OK to set a value > 8GB.
also means more memory usage**. Just I mentioned in the beginning of this post, the suggested max value is 100GB. Of course,
If your VM has big memory (e.g. 64GB), it's OK to set a value > 100GB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If your VM has big memory (e.g. 64GB), it's OK to set a value > 100GB.
If your VM has big memory (e.g. 128GB), it's OK to set a value > 100GB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a smaller VM with 64GB RAM may good with 8GB database, but if your DB is 100 GB and your VM has only 64 GB RAM, it can drastically slow down operations. Just thought to suitable with the context of this doc, you should set larger VM RAM such as 128GB.

## Memory

etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly. 100GB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it.

Copy link
Contributor

@wendy-ha18 wendy-ha18 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the context of this doc: etcd has a relatively small memory ...... Typically 8GB is enough.... 100GB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it is makes more sense for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually have a warning at 100GB? I don't have a machine I can test that on.

## Storage size limit

The default storage size limit is 2GB, configurable with `--quota-backend-bytes` flag; supports up to 8GB.
The default storage size limit is 2GB, configurable with `--quota-backend-bytes` flag; supports up to 100GB.
Copy link
Contributor

@wendy-ha18 wendy-ha18 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd v3.1.0 to v3.3.0 was released from 2017 to before May 2019. I believe the optimizations to boltDB to scale the boltDB beyond 8GBlimit only applied for version released after 2019. That means, IMHO, you only need to update docs for v3.5.0-v3.7.0.

We don't need to update older versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd restrict that further, and only update 3.5 and up.

## Memory

etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a limit, not a recommendation:

Suggested change
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly, up to a recommended maximum of 100GB.

@jberkus
Copy link
Contributor

jberkus commented Nov 19, 2025

For content/en/blog/2023/how_to_debug_large_db_size_issue.md let's take it out of this PR, and open a separate effort to convert the blog post into an Operations doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points to the need to really remove the storage size limit from this doc. It doesn't belong in the dev guide, it belongs in the operations guide.

@ToSuperGod
Copy link

May I ask if there is any data in etcd that affects the cluster during data compression and fragmentation after storing 50GB of data? And how long does it take for large-scale insertion/query operations after completing the above operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants