Skip to content

FEAT: Add IDLE_TIMEOUT config param, and update doc. #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

makeittotop
Copy link

We have a setup where HAProxy fronts the loki/mimir servers that the cortex-tenant is connecting, and sending data to. Now the HAProxy servers have configured the client idle timeout param to 15s, so that any idle connection from anyone including cortex-tenant is purged in 15 seconds. Details https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#3.10-timeout%20client

This is clear from the cortex-tenant logs -

time="2025-07-08T20:31:51Z" level=error msg="proc: src=10.185.9.101:50522 write tcp4 10.185.4.103:33906->10.178.64.210:443: write: broken pipe"
time="2025-07-08T20:31:51Z" level=error msg="proc: src=10.185.7.19:45332 write tcp4 10.185.4.103:33556->10.178.64.210:443: write: broken pipe"
time="2025-07-08T20:31:51Z" level=error msg="proc: src=10.185.7.19:45332 the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection"

So the proposed change in this PR adds a new idle timeout parameter in the config which grants the operator ability to tweak the idle timeout settings in the server to match that of the upstream proxy.

The change has been tested locally.

Dear @blind-oracle (maintainers), this is my first commit to this PR. Please let me know if there are any questions/comments.

@blind-oracle
Copy link
Owner

blind-oracle commented Jul 14, 2025

@makeittotop Thanks, looks good!

Though I think that this problem should be solved somehow differently, since the cortex-tenant is a kind of proxy, so its clients shouldn't rely on some assumptions about its backends... The timeout windows might just not match perfectly.

Maybe instead we should retry the request in fasthttp using something like https://pkg.go.dev/github.com/valyala/fasthttp#RetryIfErrFunc

@makeittotop
Copy link
Author

makeittotop commented Jul 16, 2025

Ah, I didn't realize we are losing metrics every time the connection is broken due to being idle. Am i right in assuming this @blind-oracle ?

(We have encountered this very issue in grafana Alloy agents spread throughout our infra. that have to deal with the HAP too, but they always retry implicitly)

I am keen into also looking into integrating the fasthttp retry request on error approach.

@blind-oracle blind-oracle merged commit 15cf127 into blind-oracle:main Jul 16, 2025
2 checks passed
@blind-oracle
Copy link
Owner

Which metrics? In cortex-tenant?

@makeittotop
Copy link
Author

Thanks @blind-oracle . Would you be open to me looking into implementing retry failed requests as discussed above? https://pkg.go.dev/github.com/valyala/fasthttp#RetryIfErrFunc

@blind-oracle
Copy link
Owner

Sure, let's try that

@makeittotop makeittotop deleted the feat-add-idle-timeout-param branch July 18, 2025 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants