Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 feat: add Load-Shedding Middleware for Request Timeout Management #3264

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ErfanMomeniii
Copy link

@ErfanMomeniii ErfanMomeniii commented Dec 28, 2024

Description

This pull request introduces a new Load-Shedding Middleware. This middleware enforces a configurable timeout on request processing to effectively manage server load. If a request exceeds the specified timeout, a custom load-shedding handler is invoked to gracefully handle the overloaded state.

Purpose of the Change:

  • Improve server resilience under high-load scenarios.
  • Ensure that long-running or unresponsive requests do not degrade overall system performance.
  • Provide developers with a flexible solution for request timeouts, including options for exclusions.

Changes introduced

This pull request adds the following features and updates:

  • Load-Shedding Middleware (loadshedding.New):

    • Allows enforcement of request timeouts with a custom handler for requests exceeding the timeout.
    • Supports route exclusions using a configurable predicate function.
  • Unit Tests:

    • Test_LoadSheddingExcluded: Verifies that excluded routes bypass the middleware.
    • Test_LoadSheddingTimeout: Tests behavior when requests exceed the timeout.
    • Test_LoadSheddingSuccessfulRequest: Confirms successful processing of requests completed within the timeout.
  • Documentation Update: Detailed usage examples for the middleware are added.

  • Benchmarks: Basic performance benchmarks have been conducted to ensure minimal overhead.

  • Changelog/What's New: This middleware addition will be documented in the changelog for the next release.

  • API Longevity: Designed with consistency in mind to align with Fiber's existing middleware and ensure backward compatibility.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Enhancement (improvement to existing features and functionality)
  • Documentation update (changes to documentation)
  • Performance improvement (non-breaking change which improves efficiency)
  • Code consistency (non-breaking change which improves code reliability and robustness)

Checklist

  • Followed the inspiration of the Express.js framework for new functionalities, making them similar in usage.
  • Conducted a self-review of the code and provided comments for complex or critical parts.
  • Updated the documentation in the /docs/ directory for Fiber's documentation.
  • Added or updated unit tests to validate the effectiveness of the changes or new features.
  • Ensured that new and existing unit tests pass locally with the changes.
  • Verified that any new dependencies are essential and have been agreed upon by the maintainers/community.
  • Aimed for optimal performance with minimal allocations in the new code.
  • Provided benchmarks for the new code to analyze and improve upon.

Commit formatting

Commits follow the recommended format with appropriate emojis for better identification:

  • 🚀 feat: add load-shedding middleware for request timeout management
  • ✅ test: add unit tests for load-shedding middleware

@ErfanMomeniii ErfanMomeniii requested a review from a team as a code owner December 28, 2024 20:38
@ErfanMomeniii ErfanMomeniii requested review from gaby, sixcolors, ReneWerner87 and efectn and removed request for a team December 28, 2024 20:38
Copy link

welcome bot commented Dec 28, 2024

Thanks for opening this pull request! 🎉 Please check out our contributing guidelines. If you need help or want to chat with us, join us on Discord https://gofiber.io/discord

Copy link
Contributor

coderabbitai bot commented Dec 28, 2024

Walkthrough

The pull request introduces a new load-shedding middleware for the Fiber web framework in loadshedding.go. This middleware provides a mechanism to manage server load by enforcing a timeout on request processing. It allows developers to configure a custom timeout duration, specify a load-shedding handler for timed-out requests, and optionally define route exclusions. The implementation uses goroutines and context to monitor request processing time, ensuring that long-running requests can be gracefully handled without blocking the server.

Changes

File Change Summary
middleware/loadshedding/loadshedding.go Added new middleware handler for load-shedding with configurable timeout, custom handler, and route exclusion.
middleware/loadshedding/loadshedding_test.go Added comprehensive unit tests covering excluded routes, timeout scenarios, and successful request handling.

Possibly related PRs

Suggested labels

v3

Suggested reviewers

  • gaby
  • sixcolors
  • ReneWerner87
  • efectn

Poem

🐰 Load Shedding Bunny Hops In
With timeouts swift and handlers keen
Requests race against the clock's spin
Some hop through, some must lean
A middleware dance, clean and bright! 🕒


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f64a7b and 6d7fbc2.

📒 Files selected for processing (1)
  • middleware/loadshedding/loadshedding_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • middleware/loadshedding/loadshedding_test.go

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
middleware/loadshedding/loadshedding.go (1)

10-12: Refine the comment phrasing and spacing for clarity.

The function’s comment lines could be slightly restated and spaced to enhance readability. Also, consider rephrasing “creates a middleware handler enforces” to maintain proper English grammar.

-// New creates a middleware handler enforces a timeout on request processing to manage server load.
-// If a request exceeds the specified timeout, a custom load-shedding handler is executed.
+// New creates a middleware handler that enforces a timeout on request processing to help manage server load.
+// If a request exceeds the specified timeout, a custom load-shedding handler is invoked.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 775e0a7 and 4af52fb.

📒 Files selected for processing (2)
  • middleware/loadshedding/loadshedding.go (1 hunks)
  • middleware/loadshedding/loadshedding_test.go (1 hunks)
🔇 Additional comments (9)
middleware/loadshedding/loadshedding.go (5)

13-17: Check the return value flow for excluded requests.

The exclusion check is handled correctly, ensuring that excluded routes bypass load shedding. Just confirm all necessary route patterns are accounted for in your exclude function.


19-21: Appropriate use of context for the timeout.

Creating a context with the specified timeout is done well here. This approach cleanly provides a timeout boundary for processing the request.


23-25: Double-check concurrency safety for the channel usage.

The buffered channel with capacity 1 is a good approach to avoid potential goroutine leaks. Also, ensure that no other paths close or reuse this channel.


26-29: Potential concurrency concern with fiber.Ctx usage.

Fiber’s Ctx implementation is generally not concurrency-safe. Calling c.Next() in a separate goroutine might pose a risk if the fiber.Ctx gets accessed by multiple goroutines simultaneously. Although this pattern often works in practice, be sure to confirm with Fiber documentation or tests that no concurrency safety issues arise.


31-38: Clear and effective use of select for timeout vs. request completion.

The logic to invoke loadSheddingHandler on timeout vs. returning the normal handler response is correct and straightforward.

middleware/loadshedding/loadshedding_test.go (4)

13-21: Adequate clarity in helper handlers implementation.

The division into discreet handlers (success, timeout, load-shedding) enhances readability. The 2-second sleep for timeout is ingenious for simulating a long-running request.


31-50: Comprehensive test for excluded routes.

Verifies that the middleware works as intended when routes are excluded. This is an essential scenario to ensure safe bypass logic.


51-73: Excellent timeout test approach.

The test methodically confirms that a request exceeding the 1-second cap triggers load-shedding. This thorough check ensures the middleware’s main functionality.


75-92: Valid test for successful requests within the timeout.

This covers the happy path, ensuring the request completes successfully within the allotted time. Including this case rounds out the coverage.

@gaby
Copy link
Member

gaby commented Dec 28, 2024

@ErfanMomeniii You should be able to replicate the CI using the provided Makefile.

Run this: make test

https://github.com/gofiber/fiber#-development

@gaby
Copy link
Member

gaby commented Dec 28, 2024

I believe that bug 1 is the cause of the tests failing. We should also add the improvement 1 from the below review.

This review was generated using OpenAI o1-pro:


Overall Impression

The middleware is straightforward and mostly does what it promises:

  • It creates a context.WithTimeout around the request.
  • Spawns a goroutine to run c.Next().
  • If the request is not finished within the specified timeout, it calls the loadSheddingHandler.

Its behavior is reasonably clear, and the accompanying tests show various scenarios (excluded routes, timeouts, and successful requests).

However, there are a few issues and potential improvements worth pointing out.


Bugs / Potential Issues

  1. Goroutine Resource Leak
    When you call

    go func() {
        done <- c.Next()
    }()

    you rely on the context’s cancellation to interrupt c.Next(), but in some cases, a handler might not properly handle the canceled context. If the Fiber handler does not actually stop execution upon context cancellation, that goroutine could continue doing work until completion. Be sure that all handlers in your stack properly respect ctx.Done() to avoid “zombie” goroutines.

  2. Partial Responses / Streaming Concerns
    If a handler is streaming data back to the client, the scenario where the context is canceled mid-stream isn’t explicitly handled. You might end up sending partial data before the load-shedding logic triggers. Many setups buffer responses, but if your application does any streaming, you need to ensure the context cancellation is heeded at the streaming level as well.

  3. Tight Coupling of Timeout and Load Shedding
    Currently, this is purely a request-timeout mechanism. The loadSheddingHandler name might suggest more comprehensive load shedding (like concurrency thresholds or resource-based triggers). Consider renaming to something like timeoutHandler if you don’t plan to expand its functionality, or incorporate other load metrics if you do.


Potential Improvements

  1. Explicit Cancellation Support
    To ensure that c.Next() and deeper handlers truly respect the timeout, consider passing the new context down. For example:

    c.SetContext(ctx)
    err := c.Next()

    so that any c.Context() call within subsequent handlers references the timed-out context. This helps them detect and act on cancellations more gracefully.

  2. Configurable Timeouts
    You might want per-route or dynamic timeouts. A function signature like:

    func(c fiber.Ctx) time.Duration

    to compute the timeout dynamically could be helpful, allowing you to tailor timeouts based on route complexity, user tiers, or other metrics.

  3. Allow for Additional Fallbacks
    Instead of a single loadSheddingHandler, some might want more nuanced fallback options, such as returning partial/cached data. The current design only allows one fallback function. You could consider a pattern that routes different fallback behaviors depending on the endpoint or error condition.

  4. Concurrency-Based or Adaptive Load Shedding
    If the goal is to handle heavy load more generally, consider tracking concurrency (e.g., number of ongoing requests) and shedding requests when thresholds are exceeded. A time-based approach alone may not suffice under extreme load.

  5. Use a Named Goroutine
    For debugging, consider naming the goroutine (or logging the request path) to track which request is timing out or getting stuck. This can help in diagnosing issues more quickly.


Missing Features / Wishlist

  • Graceful Shutdown Hooks
    If you’re stopping the server, you might want the load-shedding middleware to know about the shutdown event. This way, it can immediately return a “service unavailable” rather than waiting for timeouts or continuing to accept requests that may never complete.

@ErfanMomeniii
Copy link
Author

Hi @gaby, it should be fixed now.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
middleware/loadshedding/loadshedding.go (2)

10-11: Clarify the documentation header.

The comment does not explicitly mention that the timeout is enforced on a per-request basis. Consider clarifying that this middleware applies a deadline to each incoming request.

-// New creates a middleware handler enforces a timeout on request processing to manage server load.
-// If a request exceeds the specified timeout, a custom load-shedding handler is executed.
+// New creates a middleware handler that enforces a per-request timeout to manage server load. 
+// If a request exceeds the specified timeout, the custom load-shedding handler is executed.

26-36: Finalize or remove the done channel logic.

Because c.Next() blocks until completion, the err := c.Next() line has already captured any error before the goroutine sends it into done. Consequently, the channel-based approach offers little concurrency benefit. If you wish to enforce the timeout dynamically, consider running the entire operation in a separate goroutine and listening on the channel in real time. Otherwise, remove the channel to simplify.

middleware/loadshedding/loadshedding_test.go (3)

1-2: Use a consistent naming scheme for the test package.

Currently, the package is named loadshedding_test. If your convention is to place middleware tests in the same package, rename the package to loadshedding. Otherwise, ensure the _test suffix is applied consistently across your test packages for clarity.


23-25: Extend the load-shedding handler for partial data.

The Service Overloaded message is concise and predictable. For advanced scenarios (e.g., streaming or partial request completions), consider including more diagnostic information, such as expected wait times or retry intervals, to help consumers handle overload situations.


51-77: Double-check concurrency test coverage.

The Test_LoadSheddingTimeout ensures that a 2-second simulated delay triggers a 503 with a 1-second middleware timeout. However, it relies on Fiber’s internal mechanism for context cancellation. Confirm that all critical concurrency edge cases (e.g., partial content, streaming) are covered. If additional concurrency scenarios exist, create separate tests to avoid conflating multiple concerns in a single test.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8703bb and 9f64a7b.

📒 Files selected for processing (2)
  • middleware/loadshedding/loadshedding.go (1 hunks)
  • middleware/loadshedding/loadshedding_test.go (1 hunks)
🧰 Additional context used
🪛 golangci-lint (1.62.2)
middleware/loadshedding/loadshedding_test.go

18-18: func timeoutHandler is unused

(unused)

🔇 Additional comments (5)
middleware/loadshedding/loadshedding.go (3)

14-17: Check the exclusion function contract.

When exclude is set, the middleware bypasses timeout enforcement without returning any annotation or log for the excluded path. If debugging load shedding is needed, consider adding optional logging or instrumentation to confirm that certain routes are excluded as intended.


37-45: Note potential partial response issues.

If data has already been written to the response before the context expires, clients might receive partial results without a clear error or an HTTP status code update. Ensure that the load-shedding handler or subsequent logic can convey a proper 503 or alternative response if partial data streaming is possible.


19-25: ⚠️ Potential issue

Revisit context usage for concurrency and cancellation.

c.Next() is called synchronously, which means the middleware will not reach the select statement until after the handler completes (successfully or otherwise). Ensure that downstream handlers actually respect the context’s cancellation, or this approach will not explicitly interrupt a long-running or blocking operation. Fiber might handle context internally, but confirm that partial responses or streaming are properly aborted when the context is canceled.

Suggestion:

  • Spawn c.Next() in a separate goroutine, and use a select to cancel if the context times out before completion.
  • Confirm that your handlers check c.Context().Err() or similar to exit early.
middleware/loadshedding/loadshedding_test.go (2)

18-21: Confirm necessity of timeoutHandler.

Static analysis warns that timeoutHandler may be unused. Verify whether it is still required, or remove it to keep the code tidy.

🧰 Tools
🪛 golangci-lint (1.62.2)

18-18: func timeoutHandler is unused

(unused)


79-96: Consider adding negative scenarios.

While Test_LoadSheddingSuccessfulRequest verifies normal operation within the timeout, consider including scenarios for boundary conditions—e.g., requests that finish exactly at the timeout boundary or near it—to ensure robust behavior in borderline cases.

@gaby gaby added this to the v3 milestone Dec 29, 2024
Copy link

codecov bot commented Dec 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.31%. Comparing base (845a7f8) to head (6d7fbc2).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3264      +/-   ##
==========================================
+ Coverage   84.23%   84.31%   +0.08%     
==========================================
  Files         116      117       +1     
  Lines       11519    11546      +27     
==========================================
+ Hits         9703     9735      +32     
+ Misses       1391     1387       -4     
+ Partials      425      424       -1     
Flag Coverage Δ
unittests 84.31% <100.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ErfanMomeniii
Copy link
Author

Hi again @gaby , I’ve resolved the lint CI issue as well, so I believe everything is ready for merging now.
Thanks

@gaby
Copy link
Member

gaby commented Dec 30, 2024

@ErfanMomeniii Code and Tests look fine to me. Three things missing:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants