Skip to content

Conversation

@melsonic
Copy link

@melsonic melsonic commented Jun 12, 2025

cosmo.mp4

Motivation and Context

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

/fixes #1886

Summary by CodeRabbit

  • New Features

    • Added hot reload capability for MCP operations, enabling automatic detection and application of changes to operation files without restarting the router.
    • Introduced configuration options to enable or disable hot reload and set the reload interval via YAML, environment variables, and JSON schema.
    • Enhanced file watching to support monitoring entire directories for operation changes.
  • Bug Fixes

    • Prevented duplicate or unintended overwriting of storage provider configurations when MCP is enabled.
  • Tests

    • Added integration tests to verify hot reload behavior, operation updates, and to ensure no resource leaks occur during hot reload cycles.
  • Documentation

    • Updated configuration examples and schema to reflect new hot reload options.

@endigma
Copy link
Member

endigma commented Jun 17, 2025

Hi @melsonic, make sure to mark this ready for review when you think it's done.

For it to be merged, we will need:

  1. A good implementation, of high code quality with good documentation
  2. New or updated tests in router or router-tests proving the new functionality works and does not disturb other functions, unfortunately, a screen capture does not work for this, as with every new feature we need a way to prevent regressions or breakage in the future
  3. A corresponding PR to https://github.com/wundergraph/cosmo-docs for any new configuration options

For this specifically, we'd need it to be off by default, and use the polling based file watcher only, not inotify. We are aware of inotify and that it is theoretically a better solution for cases like this, but in practice it falls short in many environments so the router instead uses the simpler polling approach.

Thanks!

@melsonic
Copy link
Author

Hi @melsonic, make sure to mark this ready for review when you think it's done.

For it to be merged, we will need:

  1. A good implementation, of high code quality with good documentation
  2. New or updated tests in router or router-tests proving the new functionality works and does not disturb other functions, unfortunately, a screen capture does not work for this, as with every new feature we need a way to prevent regressions or breakage in the future
  3. A corresponding PR to https://github.com/wundergraph/cosmo-docs for any new configuration options

For this specifically, we'd need it to be off by default, and use the polling based file watcher only, not inotify. We are aware of inotify and that it is theoretically a better solution for cases like this, but in practice it falls short in many environments so the router instead uses the simpler polling approach.

Thanks!

Hi @endigma, As of now, I’ve implemented this to be applied by default. I’ll add a new config flag to allow changing that behavior.
Regarding the polling watcher behavior: currently, we can only apply a watcher to an existing file. For newly added or deleted files (if not using inotify), I’ll need to implement a new watcher functionality. Here’s the approach I’m considering:

  • Create a goroutine that periodically polls the operations directory (e.g., at interval T).
  • Trigger a reload if (lastModified - time.Now()) < T.

Let me know your thoughts!

@endigma
Copy link
Member

endigma commented Jun 18, 2025

I think it should be possible to perhaps extend the polling based watcher to walk globs or directories, and use a map to keep track of seen files

New unseen files or updates to seen files could trigger a reload

@melsonic
Copy link
Author

Hi @endigma, I've refactored the code to use a polling-based watcher for detecting operation file addition/deletion changes. It's nearly ready. I have one question though.

Should we add a separate HotReloadPollInterval config option for MCP operations (like we have for the main config), or reuse the existing PollInterval?

@endigma
Copy link
Member

endigma commented Jun 19, 2025

Enabling the watcher, setting its polling interval should be specific, yes

Let me know when you're fully done with it, as per the general merge requirements I posted above as well as you being happy with it, then I'll see about getting it into review internally!

@melsonic
Copy link
Author

Hi @endigma, this PR is ready for review.
This PR includes changes to support Hot reloading of MCP Operations and tests to verify the functionality.
I have also created another PR to cosmo-docs.
However I would like to bring up a point here related to redis-cluster. It somehow gets into an unhealthy state(verified by a wundergraph member on discord) and hence couldn't verify the functionality of tests related to redis. Apart from that I ran all the tests manually and those were passing .

@endigma endigma self-requested a review June 21, 2025 16:12
@endigma
Copy link
Member

endigma commented Jun 21, 2025

Hi @melsonic, thanks for letting me know! I'll see about getting this into the review queue for next week.

@melsonic
Copy link
Author

Hi @melsonic, thanks for letting me know! I'll see about getting this into the review queue for next week.

Sure, Thanks @endigma!

Copy link
Member

@endigma endigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll need a few more passes as well, but I have a few initial comments

@melsonic melsonic requested a review from endigma June 25, 2025 18:17
@melsonic
Copy link
Author

Hi @endigma, please review when you have time.

@coderabbitai
Copy link

coderabbitai bot commented Jul 4, 2025

"""

Walkthrough

The changes introduce hot reloading support for MCP operation files in the router. This includes new configuration options, updates to the MCP server and operation loader to support directory watching with reload intervals, and tests to verify hot reload behavior. The schema, configuration files, and internal logic are updated accordingly.

Changes

Files/Paths Change Summary
router-tests/mcp_hot_reload_test.go, router-tests/lifecycle/mcp_hot_reload_shutdown_test.go Added integration tests for MCP hot reload functionality, including goroutine leak checks.
router-tests/testenv/testenv.go Updated test environment setup to conditionally append MCP storage provider if not already set.
router/core/graph_server.go, router/core/router.go Integrated hot reload logic into router and graph server, including async reload on file changes.
router/pkg/config/config.go, router/pkg/config/config.schema.json,
router/pkg/config/fixtures/full.yaml, router/pkg/config/testdata/config_defaults.json,
router/pkg/config/testdata/config_full.json Added new configuration fields and schema for MCP hot reload (enabled, interval).
router/pkg/mcpserver/operation_manager.go, router/pkg/mcpserver/server.go Extended MCP server and operation manager to support hot reload, new options, and reload channel.
router/pkg/schemaloader/loader.go Enhanced operation loader to support hot reloading via directory watcher and reload channel.
router/pkg/watcher/watcher.go Extended watcher to support directory-level file watching with filtering and change detection.
router/pkg/watcher/watcher_test.go Added tests for directory watching, validation of options, and directory listing with filtering.

Assessment against linked issues

Objective (Issue #) Addressed Explanation
Support hot reloading of MCP operation files so changes are auto-loaded without restart (#1886)
Provide configuration to enable/disable hot reload and set reload interval (#1886)
Ensure router reloads MCP operations on file add, update, or delete (#1886)
Add automated tests to verify hot reload functionality and no goroutine leaks (#1886)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes were found. All code changes are directly related to supporting and testing hot reloading of MCP operation files as described in the linked issue.
"""


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6ba838 and a4f512e.

📒 Files selected for processing (8)
  • router-tests/testenv/testenv.go (1 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/router.go (1 hunks)
  • router/pkg/config/config.go (2 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
  • router/pkg/config/fixtures/full.yaml (1 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • router/core/router.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • router/pkg/config/testdata/config_full.json
  • router/pkg/config/fixtures/full.yaml
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/config/config.go
  • router-tests/testenv/testenv.go
  • router/pkg/config/config.schema.json
🔇 Additional comments (1)
router/core/graph_server.go (1)

1213-1215: LGTM!

The updated reload method signature properly passes the context and client schema, aligning with the broader MCP hot reload implementation.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
router/core/router.go (1)

851-852: Previous feedback addressed – option condensed into single constructor

mcpserver.WithHotReload(r.mcp.HotReloadConfig.Enabled, r.mcp.HotReloadConfig.Interval) implements the “one option (bool, time.Duration)” pattern requested earlier – thanks for the clean-up.
No functional or stylistic concerns.

🧹 Nitpick comments (1)
router/pkg/watcher/watcher.go (1)

84-99: Consider consistent error handling between initial load and runtime

During initial directory listing, errors cause the watcher to fail (line 87), but during runtime file listing, errors are only logged (line 121). This inconsistency might lead to different behavior between startup and runtime.

Consider making the error handling consistent:

 dirFilePaths, err := listDirFilePaths()
 if err != nil {
-    ll.Error("failed to list directory files", zap.Error(err))
-    return err
+    ll.Error("failed to list directory files during initialization", zap.Error(err))
+    // Continue with empty list to allow graceful degradation
+    dirFilePaths = []string{}
 }

Or alternatively, make both cases return errors for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e67757 and 84912f3.

📒 Files selected for processing (13)
  • router-tests/mcp_hot_reload_test.go (1 hunks)
  • router-tests/testenv/testenv.go (1 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/router.go (1 hunks)
  • router/pkg/config/config.go (2 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
  • router/pkg/config/fixtures/full.yaml (1 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
  • router/pkg/mcpserver/operation_manager.go (2 hunks)
  • router/pkg/mcpserver/server.go (7 hunks)
  • router/pkg/schemaloader/loader.go (3 hunks)
  • router/pkg/watcher/watcher.go (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
router/pkg/config/testdata/config_full.json (1)
Learnt from: SkArchon
PR: wundergraph/cosmo#1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
router/pkg/config/testdata/config_defaults.json (2)
Learnt from: SkArchon
PR: wundergraph/cosmo#1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
Learnt from: endigma
PR: wundergraph/cosmo#2009
File: router/pkg/config/config.go:0-0
Timestamp: 2025-07-03T10:33:25.755Z
Learning: The CardinalityLimit field in the Metrics struct (router/pkg/config/config.go) is validated at the JSON schema level in config.schema.json with a minimum value constraint of 1, preventing zero or negative values without requiring runtime validation.
router/pkg/config/config.schema.json (1)

undefined

<retrieved_learning>
Learnt from: SkArchon
PR: #1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
</retrieved_learning>

router/pkg/schemaloader/loader.go (1)
Learnt from: SkArchon
PR: wundergraph/cosmo#1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
🧬 Code Graph Analysis (3)
router-tests/testenv/testenv.go (2)
router/core/router.go (1)
  • WithStorageProviders (1984-1988)
router/pkg/config/config.go (2)
  • StorageProviders (721-726)
  • FileSystemStorageProvider (753-756)
router/core/router.go (1)
router/pkg/mcpserver/server.go (1)
  • WithHotReload (287-292)
router/pkg/mcpserver/operation_manager.go (1)
router/pkg/schemaloader/loader.go (1)
  • NewOperationLoader (42-47)
🔇 Additional comments (19)
router-tests/testenv/testenv.go (1)

1434-1446: LGTM: Conditional storage provider setup prevents configuration conflicts.

The conditional check ensures that a default filesystem storage provider is only added when no MCP storage provider ID is already configured. This prevents overwriting existing provider configurations and avoids duplicate provider registrations, which is essential for the new hot reload testing scenarios.

The logic correctly groups the storage provider setup and provider ID assignment within the same conditional block, maintaining consistency.

router/pkg/config/fixtures/full.yaml (1)

52-54: LGTM: Hot reload configuration follows best practices.

The new hot_reload_config section provides sensible defaults:

  • enabled: false ensures backward compatibility and follows the agreed approach of having hot reload disabled by default
  • interval: '10s' strikes a good balance between responsiveness and resource efficiency for the polling-based file watcher

The configuration structure is well-organized under the MCP section and properly formatted.

router/pkg/config/testdata/config_full.json (1)

160-164: Hot-reload configuration wired correctly – no issues spotted

The new HotReloadConfig block follows existing duration conventions (nanoseconds) and keeps the previous RouterURL field syntactically valid by adding the comma.
Nothing further to flag here.

router/pkg/config/testdata/config_defaults.json (1)

125-129: Defaults updated consistently

The default configuration now exposes the same HotReloadConfig shape as the full fixture. The interval (10 s) is a sensible conservative default.
LGTM.

router/pkg/config/config.go (2)

916-916: LGTM: Clean configuration field addition.

The new HotReloadConfig field is well-integrated into the MCPConfiguration struct, following established patterns for YAML serialization and environment variable naming.


928-932: LGTM: Well-designed configuration struct with sensible defaults.

The MCPOperationsHotReloadConfig struct has thoughtful design choices:

  • Enabled: false by default ensures safe feature rollout
  • 10s interval provides reasonable balance between responsiveness and system load
  • Clear field names and consistent environment variable naming
router/pkg/mcpserver/operation_manager.go (1)

35-41: LGTM!

The method signature update properly propagates the hot reload parameters to the underlying loader.

router/pkg/schemaloader/loader.go (2)

142-162: Consider returning watcher creation errors

When the watcher creation fails, the error is only logged. This means hot reload won't work but the initial load succeeds. While this might be intentional for graceful degradation, it could lead to confusion when hot reload silently fails to work.

Consider whether watcher creation errors should fail the entire operation:

 if err != nil {
     cancel()
     l.Logger.Error("Could not create watcher", zap.Error(err))
+    return nil, fmt.Errorf("failed to create watcher for hot reload: %w", err)
 }

Alternatively, if graceful degradation is desired, consider adding a warning-level log to make it more visible.


163-171: Well-implemented error handling

The goroutine properly distinguishes between expected context cancellation and unexpected errors, with appropriate logging for each case.

router-tests/mcp_hot_reload_test.go (2)

18-206: Excellent test coverage!

The tests comprehensively cover the hot reload functionality:

  • Proper use of temporary directories for test isolation
  • Good coverage of add/update/remove scenarios
  • Appropriate use of EventuallyWithT for async operations
  • Well-structured test assertions

208-282: Good goroutine leak detection!

The test properly verifies that all goroutines are cleaned up during MCP shutdown, which addresses the concern raised in previous reviews about goroutine cleanup.

router/pkg/watcher/watcher.go (1)

139-158: Well-implemented change detection logic

The implementation correctly handles:

  • New file detection by tracking seen files
  • Update detection via modification time comparison
  • Deletion detection by comparing file counts and cleaning up the seen files map
router/pkg/mcpserver/server.go (7)

71-74: Well-structured hot reload configuration options.

The new configuration fields are properly documented and follow the existing code style. The field names are clear and descriptive.


95-97: Good integration of hot reload fields into the server struct.

The new fields are appropriately placed and follow the existing naming conventions. The reloadOperationsChan channel enables asynchronous communication for reload events.


223-225: Proper initialization of hot reload fields.

The initialization correctly sets the hot reload configuration from options and creates the reload channel. The channel is properly initialized as an unbuffered channel.


286-292: Well-implemented option function for hot reload configuration.

The WithHotReload function follows the established pattern for option functions and correctly sets both the hot reload flag and interval.


342-342: Context parameter addition improves cancellation support.

Adding the context parameter to the Reload method signature enables proper cancellation and timeout handling for reload operations.


745-747: Public method exposes reload channel for external coordination.

The ReloadOperationsChannel method provides a clean interface for external components to access the reload channel, enabling proper coordination of hot reload events.


351-351: LoadOperationsFromDirectory context & parameters are correctly forwarded

  • router/pkg/mcpserver/server.go:351 calls
    s.operationsManager.LoadOperationsFromDirectory(ctx, s.operationsDir, s.reloadOperationsChan, s.hotReload, s.hotReloadInterval)
  • router/pkg/mcpserver/operation_manager.go:35 signature
    func (om *OperationsManager) LoadOperationsFromDirectory(ctx context.Context, operationsDir string, reloadOperationsChan chan bool, hotReload bool, hotReloadInterval time.Duration) error
  • router/pkg/schemaloader/loader.go signature matches and also accepts the same parameters

No changes needed here—context and hot-reload settings are properly propagated.

Comment on lines +1088 to +1090
if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
return
}
Copy link

@coderabbitai coderabbitai bot Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle reload errors without terminating the goroutine

If Reload returns an error, the goroutine exits and no further hot reloads will be processed. This could leave the system unable to recover from transient reload failures.

Consider logging the error and continuing to listen for reload signals:

 case <-s.mcpServer.ReloadOperationsChannel():
     s.logger.Log(zap.InfoLevel, "Reloading mcp server!")
     if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
-        return
+        s.logger.Error("Failed to reload MCP server", zap.Error(mErr))
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
return
}
case <-s.mcpServer.ReloadOperationsChannel():
s.logger.Log(zap.InfoLevel, "Reloading mcp server!")
if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
s.logger.Error("Failed to reload MCP server", zap.Error(mErr))
}
🤖 Prompt for AI Agents
In router/core/graph_server.go around lines 1088 to 1090, the current code
returns immediately if s.mcpServer.Reload returns an error, causing the
goroutine to exit and stop processing further reloads. Modify the code to log
the error instead of returning, allowing the goroutine to continue running and
handle subsequent reload signals without termination.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @endigma, Please let me know if it is the expected behaviour to further retry mcpServer reload operations even after reload failure.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@melsonic
Copy link
Author

melsonic commented Jul 4, 2025

Hi @endigma, I have updated the cosmo watcher to watch CRUD operations in a directory with a file filter. In this scenario, the filefilter would help us to tell the watcher to only watch graphql files.
Please review my approach & let me know if you have any suggestions or updates.
I will also have to add the tests for this added functionality, I'll add them once you are okay with my changes.

Thank you!

@melsonic
Copy link
Author

Hi @endigma, please review this MR once you have time.
Thank you!

@endigma
Copy link
Member

endigma commented Jul 10, 2025

Hi @melsonic , you can just re-request when its ready, can you resolve the conflict? I'll put this back in the queue for review.

@endigma endigma self-requested a review July 10, 2025 15:42
Copy link
Member

@endigma endigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, some high level review:

  • You should add unit tests for the new watcher behaviour in watcher_test.go
  • We don't need to support Directory and Paths operating modes at the time same, we can use Directory as a way to build a list of paths or use a provided one, and we can rebuild the list of tracked files on every loop for the directory mode.

t.Run("List Updated User Operations On Addition and Removal", func(t *testing.T) {

operationsDir := t.TempDir()
storageProviderId := "mcp_hot_reload_test_id"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageProviderId should be ID and it seems the same across all subtests so can we move it out to test scope?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can.


initialToolsCount := len(resp.Tools)

filePath := operationsDir + "/main.graphql"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using filepath.Join is best for this sort of thing, and can it be given a more descriptive name? This applies to all subtests.

Comment on lines 89 to 113
assert.EventuallyWithT(t, func(t *assert.CollectT) {

resp, err = xEnv.MCPClient.ListTools(xEnv.Context, toolsRequest)
assert.NoError(t, err)
assert.Len(t, resp.Tools, initialToolsCount)

// verity getEmployeeNotes operation tool is properly removed
require.NotContains(t, resp.Tools, mcp.Tool{
Name: "execute_operation_get_employee_notes",
Description: "Executes the GraphQL operation 'getEmployeeNotes' of type query.",
InputSchema: mcp.ToolInputSchema{
Type: "object",
Properties: map[string]interface{}{"id": map[string]interface{}{"type": "integer"}},
Required: []string{"id"},
},
RawInputSchema: json.RawMessage(nil),
Annotations: mcp.ToolAnnotation{
Title: "Execute operation getEmployeeNotes",
ReadOnlyHint: mcp.ToBoolPtr(true),
IdempotentHint: mcp.ToBoolPtr(true),
OpenWorldHint: mcp.ToBoolPtr(true),
},
})

}, 15*time.Second, 250*time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using assert.Eventually... functions, we recently introduced an option to the watcher to directly provide a non-timing based tick source. You can check watcher_test.go for how to use it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll check and update the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of mcp hot reload, we don't have direct access to the watcher config options, the watcher is defined in the loader.go file. So, in this particular scenario it is not possible to use non-timing based tick source.

Comment on lines 48 to 71
listDirFilePaths := func() ([]string, error) {
var files []string
if options.Directory.DirPath != "" {
err := filepath.WalkDir(options.Directory.DirPath, func(path string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
// Skip directories
if d.IsDir() {
return nil
}
// Skip if filter rejects the file
if options.Directory.Filter != nil && options.Directory.Filter(path) {
return nil
}
files = append(files, path)
return nil
})
if err != nil {
return []string{}, fmt.Errorf("error walking directory %s: %w", options.Directory.DirPath, err)
}
}
return files, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be extracted and tested separately? Also, "filter" semantics usually return true when something should remain in the list, not be removed from it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll create a util function outside New with unit test.

Comment on lines 29 to 39
if len(options.Paths) == 0 {
return nil, errors.New("path must be provided")
if len(options.Paths) == 0 && options.Directory.DirPath == "" {
return nil, errors.New("paths or directory must be provided")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should error if both Paths and Directory are provided, and I don't believe filter is necessary at the moment. This will allow you to deduplicate all the loops below and integrate Directories as just a source for a list of paths instead of separate handling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the required design. It would definitely prevent duplicate runs from out function.

}

prevModTimes := make(map[string]time.Time)
seenDirFilePaths := make(map[string]struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What purpose does this serve? You maintain it as an index of the currently known file paths but don't use it for anything that prevModTimes can't be used for from what I can tell. We can already detect deleted files by looping over the prevModTimes map and comparing it with os.Stat result of the expected file. We already handle this by resetting the time so should it be created again it will be detected as an update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it.

@melsonic melsonic requested a review from jensneuse as a code owner July 15, 2025 18:32
@melsonic
Copy link
Author

Hi @endigma, I have updated the PR incorporating your comments. However, I do have these following questions.

  1. Should we throw an error in case both options.Paths & options.Directory is empty?
  2. Since we are only allowing either Paths or Directory, in case the newly defined function ListDirFilePaths returns 0 file paths, should we also throw an error from our watcher function?

}
}

if options.Directory.DirPath != "" {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reloading the updated directory files after going through the previously listed files. The will allow us to remove deleted files from prevModTimes in one loop. However there is a catch, in case of a new file addition, instead of 2 ticks, it will consume 3 ticks to run the provided Callback func.

On the other hand, in case we list the updated directory files before going through options.Paths, we'll have to run another loop on prevModTimes to remove keys for files, that are deleted from the directory.

Currently I have implemented the first approach, let me know if the team prefers the 2nd option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally choose option 2, looping over a map and updating it is not terribly slow and hot reloading is nowhere near a hot path. More intuitive or consistent reload behavior is more important than raw speed of a single tick handler

@melsonic melsonic requested a review from endigma July 15, 2025 18:51
@endigma
Copy link
Member

endigma commented Jul 15, 2025

Hi @melsonic, I think having 0 paths to check is fine, provided we defensively prevent anything bad from happening in this case. This would allow gracefully handling a multi-tick wide total directory swap or something like a human speed drag-out -> drag-in operation.

Generally with watcher code we prefer resilience over all, e.g not failing out when we hit an invalid state and instead just falling back to a previous working state until some iteration is usable. In this case I believe having mcp enabled but with 0 operations should not be a failure case, but I'm not sure at the moment

@endigma
Copy link
Member

endigma commented Jul 15, 2025

And yes, watcher constructor should fail if the arguments themselves are both empty

@melsonic
Copy link
Author

melsonic commented Jul 17, 2025

Hi @endigma, I am ready with my changes, please review it once you have time.

@melsonic
Copy link
Author

Hi @endigma, Please review.

P.S: If you are busy with some other high priority task, please take your time. Commenting just in case you missed my previous review notification.

@endigma
Copy link
Member

endigma commented Jul 25, 2025

Hi @melsonic, this is low priority internally at the moment, but rest assured it's in the queue

@melsonic
Copy link
Author

No problem @endigma, just wanted to know if it's in the queue or not.

@github-actions
Copy link

github-actions bot commented Aug 9, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 9, 2025
@endigma
Copy link
Member

endigma commented Aug 9, 2025

Unstale

@github-actions github-actions bot removed the Stale label Aug 10, 2025
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 25, 2025
@melsonic
Copy link
Author

unstale

@github-actions github-actions bot removed the Stale label Aug 26, 2025
@endigma
Copy link
Member

endigma commented Sep 3, 2025

Hi @melsonic, we've decided internally to adopt your work as the basis for our own fix for the original issue, as such, we'll be closing this PR for now. Thanks for your contribution!

@endigma endigma closed this Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support hot reloading of MCP operation files

2 participants