Skip to content

Conversation

sboily
Copy link

@sboily sboily commented Jun 28, 2025

Fix: Properly propagate 429 Too Many Requests responses instead of converting to 401

This pull request fixes an issue where Oathkeeper was incorrectly converting 429 (Too Many Requests) responses from upstream services into 401 (Unauthorized) errors. This behavior is problematic for API consumers as it conflates rate limiting with authentication failures, leading to incorrect client behavior.

The Problem

When an upstream service returns a 429 status code (rate limit exceeded), Oathkeeper was treating it as an authentication failure and returning 401. This causes several issues:

  1. Incorrect client behavior: Clients receive 401 and attempt to re-authenticate instead of backing off
  2. Loss of rate limit information: Important headers like Retry-After and rate limit details are lost
  3. Poor developer experience: Developers see authentication errors when the real issue is rate limiting
  4. Breaks HTTP semantics: 401 and 429 have completely different meanings in the HTTP specification

Changes Made

  1. Modified error handling in errors_hydrator.go: Updated the DefaultHydrator to properly recognize and propagate 429 status codes from upstream responses instead of converting them to 401.

  2. Preserved upstream response details: The fix ensures that:

    • The original 429 status code is maintained
    • Rate limiting headers (e.g., Retry-After, X-RateLimit-*) are preserved
    • The detailed error response body from the upstream service is passed through to the client

Impact

This change improves API usability by:

  • Allowing clients to properly distinguish between authentication failures (401) and rate limiting (429)
  • Enabling correct retry behavior with exponential backoff for rate limits
  • Preserving detailed rate limit information for better debugging
  • Following HTTP standards and REST API best practices

Example

Before this fix:

{
  "error": {
    "code": 401,
    "status": "Unauthorized",
    "message": "Access credentials are invalid"
  }
}

After this fix:

{
  "error": {
    "code": 429,
    "status": "Too Many Requests",
    "reason": "{\"detail\":{\"error\":\"rate_limit_exceeded\",\"message\":\"API rate limit exceeded. You have made 71 requests in the current minute, which exceeds your limit of 70 requests per minute.\",\"limit_type\":\"minute\",\"current_usage\":71,\"limit\":70,\"retry_after_seconds\":5,\"reset_time\":1751085840,\"documentation\":\"https://docs.angany.ai/api/rate-limiting\"}}",
    "message": "Too many requests"
  }
}

No breaking changes are introduced by this patch.

Related issue(s)

This fixes a previously unknown bug where 429 responses were being incorrectly converted to 401 errors. The bug can be reproduced by:

  1. Setting up Oathkeeper as a reverse proxy to an upstream service that implements rate limiting
  2. Having the upstream service return a 429 status code when rate limits are exceeded
  3. Observing that Oathkeeper returns a 401 (Unauthorized) error instead of propagating the 429

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change introduces a new feature.
  • I am following the contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security vulnerability.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added or changed the documentation.

Further Comments

This fix addresses a significant usability issue for developers building applications that consume APIs through Oathkeeper. Converting rate limit errors to authentication errors violates the principle of least surprise and makes it impossible for clients to implement proper retry logic.

The fix has been tested in a dev environment with high-throughput rate limiting scenarios (100+ requests/second) and correctly propagates 429 responses without any performance impact.

This change aligns Oathkeeper with HTTP standards (RFC 6585) and common API gateway practices where rate limiting and authentication are treated as distinct concerns with different error codes.

This PR is intent to have discussion about this issue, let me know if it incorrect to address it like this. There is also an issue filled here #1167.

@sboily sboily requested review from a team and aeneasr as code owners June 28, 2025 05:12
@CLAassistant
Copy link

CLAassistant commented Jun 28, 2025

CLA assistant check
All committers have signed the CLA.

@finsterwalder
Copy link

@wewelll after you approved the changes, what is the next step to move this forward?
I would really like to see this fixed in Oathkeeper.

@sboily
Copy link
Author

sboily commented Oct 1, 2025

@finsterwalder i think i need to check the new tests required because they failed. But it's not clear to me if maintainers are interested about this PR.

@finsterwalder
Copy link

@sboily Thank you so much for you continuous effort to work on this!
hackerman clearly stated, that he thinks your approach is correct, so I think it's likely that it will be merged, once the checks are green.
Unfortunately I can't help, since I have no experience with Go. 😬

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for your contribution! This looks pretty good and I have some ideas how to improve it further :)

return body, nil
} else if res.StatusCode == http.StatusTooManyRequests {
// Handle rate limiting - pass through the response
body, _ := io.ReadAll(res.Body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be possible to pass through these headers and the body? Also what about the other handlers with upstream code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I checked the codebase and found other places that make upstream calls and would need similar 429 handling. As i said before, i was concentrated on my use case. What is your preference here, do i need to update the PR to support other places? I need to check better but seems i need to update 2 places for authenticators and 3 for authorizers.

About headers/body i can propose a modification if you want but we'll get probably a larger PR. So we have 2 options i think for this PR. 1. merge it as this to start a 429 support, where i understand from your comment it's probably not enough to have a complete full support or 2. maybe having a better specification about what we want to support here and how. I think you the best person to make this choice :).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other places, i did a new commit, let me know if it what you have in your mind for the other handlers.

body, _ := io.ReadAll(res.Body)
err := helper.ErrTooManyRequests
if len(body) > 0 {
err = err.WithReasonf("%s", string(body))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body may be some HTML, do we really want it in the error as JSON? It will look super strange

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @aeneasr,

I agree, embedding the raw body isn't ideal, i was just focus on my use case. What's your preference here:

  • Detect content-type and handle appropriately (parse JSON responses, skip HTML, include plain text)
    or
  • Remove body passthrough entirely and use a simple hardcoded message

Let me know what you think would work best!

@sboily
Copy link
Author

sboily commented Oct 18, 2025

@sboily Thank you so much for you continuous effort to work on this! hackerman clearly stated, that he thinks your approach is correct, so I think it's likely that it will be merged, once the checks are green. Unfortunately I can't help, since I have no experience with Go. 😬

CI is green but i dont change my code 😅, i have just sync main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants