Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OOM when clickhouse is slow and a lot of insert queries are sent #428

Open
bzed opened this issue Apr 23, 2024 · 4 comments
Open

[BUG] OOM when clickhouse is slow and a lot of insert queries are sent #428

bzed opened this issue Apr 23, 2024 · 4 comments
Labels

Comments

@bzed
Copy link

bzed commented Apr 23, 2024

Describe the bug
We regularly see the following issue:

  • clickhouse not being as fast as expected due to high load (not nice, things to optimize but these things can happen)
  • chproxy receiving lots of inserts from applications without being able to forward them in time. It happily accepts hundreds of connections in parallel...
  • chproxy being OOM killed
  • chproxy restarted, introducing even more load on the clickhouse server due to new connections....

To Reproduce

  • make clickhouse slow
  • put chproxy under load with lots of big parallel inserts.

Expected behavior
No OOM. Better memory handling. Cancel connections or let them wait before running OOM.

Environment information
chproxy v1.26.2
clickhosue 24.3.2.23

@vfoucault
Copy link
Contributor

hi there, this is a tricky issue. I don't really see a positive outcome here rather than using a rate limiter for your inserter and handle the back pressure at the data producer level.

Another option would be to add way more memory to your chproxy, or to bypass chproxy for data insertion or to make clickhosue faster 😅

No miracle would happen here.

@bzed
Copy link
Author

bzed commented Apr 23, 2024

Yes, indeed a tricky issue. Rate limiting in front of chproxy is (much stricter) in place now. But still, imho is a program running into OOM a bug :) Just adding more resources will just move the point where the oom will happen. To solve this bug I think a completely different memory management would be needed, but yes, its not trivial as not all connection need the same amount of memory.

@mga-chka
Copy link
Collaborator

mga-chka commented Apr 23, 2024

Unfortunately, we (contentsquare) don't use chproxy to insert data. This feature has been done by the previous maintainers (Vertamedia) and we don't maintain it anymore.
If it was happening on select queries, we might do something (but from what I remember, the query results are either streamed or put in temporary files to avoid an OOM in such situation). But since it's about insert queries, feel free to make a PR to fix the issue. As Vianney said, it will be tricky to solve it, and you should use a rate limiter to make sure it can't happen, for example by using the max_concurrent_queries parameter

@mga-chka mga-chka added the bug label Apr 23, 2024
@mga-chka mga-chka changed the title [BUG] OOM when clickhouse is slow [BUG] OOM when clickhouse is slow and a lot of insert queries are sent Apr 23, 2024
@Frank030366
Copy link

@mga-chka - I've experienced the same issues as author described: Chproxy catches OOM under heavy INSERT load with large batches. So I've made some tests and can shed some light on nature of this bug - it seems that this issue was introduced by changes in 1.22.0 release because 1.21.0 works stable in our environment but 1.22 OOM killed after ~10-20 seconds after starting workload. At least two changes probably introduced this bug: #299 and #296. To test it I've built custom binary from 1.22 sources with that changes reverted and it works stable under our load. But original 1.22 binary and the latest version binary are OOM killed.

One of possible root causes - maybe it's not efficient to load every incoming request body for possible retry because it can be very huge for INSERT like workload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants