feat: Support hot reloading of MCP operation files #1959

melsonic · 2025-06-12T19:28:16Z

cosmo.mp4

Motivation and Context

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
I have read the Contributors Guide.

/fixes #1886

Summary by CodeRabbit

New Features
- Added hot reload capability for MCP operations, enabling automatic detection and application of changes to operation files without restarting the router.
- Introduced configuration options to enable or disable hot reload and set the reload interval via YAML, environment variables, and JSON schema.
- Enhanced file watching to support monitoring entire directories for operation changes.
Bug Fixes
- Prevented duplicate or unintended overwriting of storage provider configurations when MCP is enabled.
Tests
- Added integration tests to verify hot reload behavior, operation updates, and to ensure no resource leaks occur during hot reload cycles.
Documentation
- Updated configuration examples and schema to reflect new hot reload options.

endigma · 2025-06-17T09:33:01Z

Hi @melsonic, make sure to mark this ready for review when you think it's done.

For it to be merged, we will need:

A good implementation, of high code quality with good documentation
New or updated tests in router or router-tests proving the new functionality works and does not disturb other functions, unfortunately, a screen capture does not work for this, as with every new feature we need a way to prevent regressions or breakage in the future
A corresponding PR to https://github.com/wundergraph/cosmo-docs for any new configuration options

For this specifically, we'd need it to be off by default, and use the polling based file watcher only, not inotify. We are aware of inotify and that it is theoretically a better solution for cases like this, but in practice it falls short in many environments so the router instead uses the simpler polling approach.

Thanks!

melsonic · 2025-06-17T19:41:38Z

Hi @melsonic, make sure to mark this ready for review when you think it's done.

For it to be merged, we will need:

A good implementation, of high code quality with good documentation

New or updated tests in router or router-tests proving the new functionality works and does not disturb other functions, unfortunately, a screen capture does not work for this, as with every new feature we need a way to prevent regressions or breakage in the future

A corresponding PR to https://github.com/wundergraph/cosmo-docs for any new configuration options

For this specifically, we'd need it to be off by default, and use the polling based file watcher only, not inotify. We are aware of inotify and that it is theoretically a better solution for cases like this, but in practice it falls short in many environments so the router instead uses the simpler polling approach.

Thanks!

Hi @endigma, As of now, I’ve implemented this to be applied by default. I’ll add a new config flag to allow changing that behavior.
Regarding the polling watcher behavior: currently, we can only apply a watcher to an existing file. For newly added or deleted files (if not using inotify), I’ll need to implement a new watcher functionality. Here’s the approach I’m considering:

Create a goroutine that periodically polls the operations directory (e.g., at interval T).
Trigger a reload if (lastModified - time.Now()) < T.

Let me know your thoughts!

endigma · 2025-06-18T07:48:49Z

I think it should be possible to perhaps extend the polling based watcher to walk globs or directories, and use a map to keep track of seen files

New unseen files or updates to seen files could trigger a reload

melsonic · 2025-06-19T17:57:24Z

Hi @endigma, I've refactored the code to use a polling-based watcher for detecting operation file addition/deletion changes. It's nearly ready. I have one question though.

Should we add a separate HotReloadPollInterval config option for MCP operations (like we have for the main config), or reuse the existing PollInterval?

endigma · 2025-06-19T19:44:48Z

Enabling the watcher, setting its polling interval should be specific, yes

Let me know when you're fully done with it, as per the general merge requirements I posted above as well as you being happy with it, then I'll see about getting it into review internally!

melsonic · 2025-06-21T14:51:41Z

Hi @endigma, this PR is ready for review.
This PR includes changes to support Hot reloading of MCP Operations and tests to verify the functionality.
I have also created another PR to cosmo-docs.
However I would like to bring up a point here related to redis-cluster. It somehow gets into an unhealthy state(verified by a wundergraph member on discord) and hence couldn't verify the functionality of tests related to redis. Apart from that I ran all the tests manually and those were passing .

endigma · 2025-06-21T16:13:43Z

Hi @melsonic, thanks for letting me know! I'll see about getting this into the review queue for next week.

melsonic · 2025-06-22T18:21:53Z

Hi @melsonic, thanks for letting me know! I'll see about getting this into the review queue for next week.

Sure, Thanks @endigma!

endigma

It'll need a few more passes as well, but I have a few initial comments

router-tests/mcp_hot_reload_test.go

router/core/graph_server.go

router/core/router.go

router/pkg/schemaloader/loader.go

melsonic · 2025-06-29T06:17:18Z

Hi @endigma, please review when you have time.

router/pkg/schemaloader/loader.go

coderabbitai · 2025-07-04T19:56:47Z

"""

Walkthrough

The changes introduce hot reloading support for MCP operation files in the router. This includes new configuration options, updates to the MCP server and operation loader to support directory watching with reload intervals, and tests to verify hot reload behavior. The schema, configuration files, and internal logic are updated accordingly.

Changes

Files/Paths	Change Summary
router-tests/mcp_hot_reload_test.go, router-tests/lifecycle/mcp_hot_reload_shutdown_test.go	Added integration tests for MCP hot reload functionality, including goroutine leak checks.
router-tests/testenv/testenv.go	Updated test environment setup to conditionally append MCP storage provider if not already set.
router/core/graph_server.go, router/core/router.go	Integrated hot reload logic into router and graph server, including async reload on file changes.
router/pkg/config/config.go, router/pkg/config/config.schema.json,
router/pkg/config/fixtures/full.yaml, router/pkg/config/testdata/config_defaults.json,
router/pkg/config/testdata/config_full.json	Added new configuration fields and schema for MCP hot reload (enabled, interval).
router/pkg/mcpserver/operation_manager.go, router/pkg/mcpserver/server.go	Extended MCP server and operation manager to support hot reload, new options, and reload channel.
router/pkg/schemaloader/loader.go	Enhanced operation loader to support hot reloading via directory watcher and reload channel.
router/pkg/watcher/watcher.go	Extended watcher to support directory-level file watching with filtering and change detection.
router/pkg/watcher/watcher_test.go	Added tests for directory watching, validation of options, and directory listing with filtering.

Assessment against linked issues

Objective (Issue #)	Addressed	Explanation
Support hot reloading of MCP operation files so changes are auto-loaded without restart (#1886)	✅
Provide configuration to enable/disable hot reload and set reload interval (#1886)	✅
Ensure router reloads MCP operations on file add, update, or delete (#1886)	✅
Add automated tests to verify hot reload functionality and no goroutine leaks (#1886)	✅

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes were found. All code changes are directly related to supporting and testing hot reloading of MCP operation files as described in the linked issue.
"""

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6ba838 and a4f512e.

📒 Files selected for processing (8)

router-tests/testenv/testenv.go (1 hunks)
router/core/graph_server.go (1 hunks)
router/core/router.go (1 hunks)
router/pkg/config/config.go (2 hunks)
router/pkg/config/config.schema.json (1 hunks)
router/pkg/config/fixtures/full.yaml (1 hunks)
router/pkg/config/testdata/config_defaults.json (1 hunks)
router/pkg/config/testdata/config_full.json (1 hunks)

✅ Files skipped from review due to trivial changes (1)

router/core/router.go

🚧 Files skipped from review as they are similar to previous changes (6)

router/pkg/config/testdata/config_full.json
router/pkg/config/fixtures/full.yaml
router/pkg/config/testdata/config_defaults.json
router/pkg/config/config.go
router-tests/testenv/testenv.go
router/pkg/config/config.schema.json

🔇 Additional comments (1)

router/core/graph_server.go (1)

1213-1215: LGTM!

The updated reload method signature properly passes the context and client schema, aligning with the broader MCP hot reload implementation.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

router/core/router.go (1)

851-852: Previous feedback addressed – option condensed into single constructor

mcpserver.WithHotReload(r.mcp.HotReloadConfig.Enabled, r.mcp.HotReloadConfig.Interval) implements the “one option (bool, time.Duration)” pattern requested earlier – thanks for the clean-up.
No functional or stylistic concerns.

🧹 Nitpick comments (1)

router/pkg/watcher/watcher.go (1)
84-99: Consider consistent error handling between initial load and runtime

During initial directory listing, errors cause the watcher to fail (line 87), but during runtime file listing, errors are only logged (line 121). This inconsistency might lead to different behavior between startup and runtime.

Consider making the error handling consistent:
 dirFilePaths, err := listDirFilePaths()
 if err != nil {
-    ll.Error("failed to list directory files", zap.Error(err))
-    return err
+    ll.Error("failed to list directory files during initialization", zap.Error(err))
+    // Continue with empty list to allow graceful degradation
+    dirFilePaths = []string{}
 }
Or alternatively, make both cases return errors for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e67757 and 84912f3.

📒 Files selected for processing (13)

router-tests/mcp_hot_reload_test.go (1 hunks)
router-tests/testenv/testenv.go (1 hunks)
router/core/graph_server.go (1 hunks)
router/core/router.go (1 hunks)
router/pkg/config/config.go (2 hunks)
router/pkg/config/config.schema.json (1 hunks)
router/pkg/config/fixtures/full.yaml (1 hunks)
router/pkg/config/testdata/config_defaults.json (1 hunks)
router/pkg/config/testdata/config_full.json (1 hunks)
router/pkg/mcpserver/operation_manager.go (2 hunks)
router/pkg/mcpserver/server.go (7 hunks)
router/pkg/schemaloader/loader.go (3 hunks)
router/pkg/watcher/watcher.go (5 hunks)

🧰 Additional context used

🧠 Learnings (4)

router/pkg/config/testdata/config_full.json (1)

Learnt from: SkArchon
PR: wundergraph/cosmo#1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.

router/pkg/config/testdata/config_defaults.json (2)

Learnt from: SkArchon
PR: wundergraph/cosmo#1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.

Learnt from: endigma
PR: wundergraph/cosmo#2009
File: router/pkg/config/config.go:0-0
Timestamp: 2025-07-03T10:33:25.755Z
Learning: The CardinalityLimit field in the Metrics struct (router/pkg/config/config.go) is validated at the JSON schema level in config.schema.json with a minimum value constraint of 1, preventing zero or negative values without requiring runtime validation.

router/pkg/config/config.schema.json (1)

undefined

<retrieved_learning>
Learnt from: SkArchon
PR: #1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
</retrieved_learning>

router/pkg/schemaloader/loader.go (1)

Learnt from: SkArchon
PR: wundergraph/cosmo#1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.

🧬 Code Graph Analysis (3)

router-tests/testenv/testenv.go (2)

router/core/router.go (1)

WithStorageProviders (1984-1988)

router/pkg/config/config.go (2)

StorageProviders (721-726)

FileSystemStorageProvider (753-756)

router/core/router.go (1)

router/pkg/mcpserver/server.go (1)

WithHotReload (287-292)

router/pkg/mcpserver/operation_manager.go (1)

router/pkg/schemaloader/loader.go (1)

NewOperationLoader (42-47)

🔇 Additional comments (19)

router-tests/testenv/testenv.go (1)

1434-1446: LGTM: Conditional storage provider setup prevents configuration conflicts.

The conditional check ensures that a default filesystem storage provider is only added when no MCP storage provider ID is already configured. This prevents overwriting existing provider configurations and avoids duplicate provider registrations, which is essential for the new hot reload testing scenarios.

The logic correctly groups the storage provider setup and provider ID assignment within the same conditional block, maintaining consistency.

router/pkg/config/fixtures/full.yaml (1)

52-54: LGTM: Hot reload configuration follows best practices.

The new hot_reload_config section provides sensible defaults:

enabled: false ensures backward compatibility and follows the agreed approach of having hot reload disabled by default

interval: '10s' strikes a good balance between responsiveness and resource efficiency for the polling-based file watcher

The configuration structure is well-organized under the MCP section and properly formatted.

router/pkg/config/testdata/config_full.json (1)

160-164: Hot-reload configuration wired correctly – no issues spotted

The new HotReloadConfig block follows existing duration conventions (nanoseconds) and keeps the previous RouterURL field syntactically valid by adding the comma.
Nothing further to flag here.

router/pkg/config/testdata/config_defaults.json (1)

125-129: Defaults updated consistently

The default configuration now exposes the same HotReloadConfig shape as the full fixture. The interval (10 s) is a sensible conservative default.
LGTM.

router/pkg/config/config.go (2)

916-916: LGTM: Clean configuration field addition.

The new HotReloadConfig field is well-integrated into the MCPConfiguration struct, following established patterns for YAML serialization and environment variable naming.

928-932: LGTM: Well-designed configuration struct with sensible defaults.

The MCPOperationsHotReloadConfig struct has thoughtful design choices:

Enabled: false by default ensures safe feature rollout

10s interval provides reasonable balance between responsiveness and system load

Clear field names and consistent environment variable naming

router/pkg/mcpserver/operation_manager.go (1)

35-41: LGTM!

The method signature update properly propagates the hot reload parameters to the underlying loader.
router/pkg/schemaloader/loader.go (2)

142-162: Consider returning watcher creation errors

When the watcher creation fails, the error is only logged. This means hot reload won't work but the initial load succeeds. While this might be intentional for graceful degradation, it could lead to confusion when hot reload silently fails to work.

Consider whether watcher creation errors should fail the entire operation:
 if err != nil {
     cancel()
     l.Logger.Error("Could not create watcher", zap.Error(err))
+    return nil, fmt.Errorf("failed to create watcher for hot reload: %w", err)
 }
Alternatively, if graceful degradation is desired, consider adding a warning-level log to make it more visible.

163-171: Well-implemented error handling

The goroutine properly distinguishes between expected context cancellation and unexpected errors, with appropriate logging for each case.
router-tests/mcp_hot_reload_test.go (2)

18-206: Excellent test coverage!

The tests comprehensively cover the hot reload functionality:

Proper use of temporary directories for test isolation

Good coverage of add/update/remove scenarios

Appropriate use of EventuallyWithT for async operations

Well-structured test assertions

208-282: Good goroutine leak detection!

The test properly verifies that all goroutines are cleaned up during MCP shutdown, which addresses the concern raised in previous reviews about goroutine cleanup.

router/pkg/watcher/watcher.go (1)

139-158: Well-implemented change detection logic

The implementation correctly handles:

New file detection by tracking seen files

Update detection via modification time comparison

Deletion detection by comparing file counts and cleaning up the seen files map

router/pkg/mcpserver/server.go (7)

71-74: Well-structured hot reload configuration options.

The new configuration fields are properly documented and follow the existing code style. The field names are clear and descriptive.

95-97: Good integration of hot reload fields into the server struct.

The new fields are appropriately placed and follow the existing naming conventions. The reloadOperationsChan channel enables asynchronous communication for reload events.

223-225: Proper initialization of hot reload fields.

The initialization correctly sets the hot reload configuration from options and creates the reload channel. The channel is properly initialized as an unbuffered channel.

286-292: Well-implemented option function for hot reload configuration.

The WithHotReload function follows the established pattern for option functions and correctly sets both the hot reload flag and interval.

342-342: Context parameter addition improves cancellation support.

Adding the context parameter to the Reload method signature enables proper cancellation and timeout handling for reload operations.

745-747: Public method exposes reload channel for external coordination.

The ReloadOperationsChannel method provides a clean interface for external components to access the reload channel, enabling proper coordination of hot reload events.

351-351: LoadOperationsFromDirectory context & parameters are correctly forwarded

router/pkg/mcpserver/server.go:351 calls
s.operationsManager.LoadOperationsFromDirectory(ctx, s.operationsDir, s.reloadOperationsChan, s.hotReload, s.hotReloadInterval)

router/pkg/mcpserver/operation_manager.go:35 signature
func (om *OperationsManager) LoadOperationsFromDirectory(ctx context.Context, operationsDir string, reloadOperationsChan chan bool, hotReload bool, hotReloadInterval time.Duration) error

router/pkg/schemaloader/loader.go signature matches and also accepts the same parameters

No changes needed here—context and hot-reload settings are properly propagated.

router/pkg/config/config.schema.json

coderabbitai · 2025-07-04T20:02:25Z

router/core/graph_server.go

+					if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
+						return
+					}


⚠️ Potential issue

Handle reload errors without terminating the goroutine

If Reload returns an error, the goroutine exits and no further hot reloads will be processed. This could leave the system unable to recover from transient reload failures.

Consider logging the error and continuing to listen for reload signals:

case <-s.mcpServer.ReloadOperationsChannel(): s.logger.Log(zap.InfoLevel, "Reloading mcp server!") if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil { - return + s.logger.Error("Failed to reload MCP server", zap.Error(mErr)) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {

return

}

case <-s.mcpServer.ReloadOperationsChannel():

s.logger.Log(zap.InfoLevel, "Reloading mcp server!")

if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {

s.logger.Error("Failed to reload MCP server", zap.Error(mErr))

}

🤖 Prompt for AI Agents

In router/core/graph_server.go around lines 1088 to 1090, the current code returns immediately if s.mcpServer.Reload returns an error, causing the goroutine to exit and stop processing further reloads. Modify the code to log the error instead of returning, allowing the goroutine to continue running and handle subsequent reload signals without termination.

Hi @endigma, Please let me know if it is the expected behaviour to further retry mcpServer reload operations even after reload failure.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

melsonic · 2025-07-04T20:03:54Z

Hi @endigma, I have updated the cosmo watcher to watch CRUD operations in a directory with a file filter. In this scenario, the filefilter would help us to tell the watcher to only watch graphql files.
Please review my approach & let me know if you have any suggestions or updates.
I will also have to add the tests for this added functionality, I'll add them once you are okay with my changes.

Thank you!

melsonic · 2025-07-10T15:40:08Z

Hi @endigma, please review this MR once you have time.
Thank you!

endigma · 2025-07-10T15:42:00Z

Hi @melsonic , you can just re-request when its ready, can you resolve the conflict? I'll put this back in the queue for review.

endigma

Hi, some high level review:

You should add unit tests for the new watcher behaviour in watcher_test.go
We don't need to support Directory and Paths operating modes at the time same, we can use Directory as a way to build a list of paths or use a provided one, and we can rebuild the list of tracked files on every loop for the directory mode.

endigma · 2025-07-10T15:46:16Z

router-tests/mcp_hot_reload_test.go

+	t.Run("List Updated User Operations On Addition and Removal", func(t *testing.T) {
+
+		operationsDir := t.TempDir()
+		storageProviderId := "mcp_hot_reload_test_id"


storageProviderId should be ID and it seems the same across all subtests so can we move it out to test scope?

Yes, we can.

endigma · 2025-07-10T15:47:16Z

router-tests/mcp_hot_reload_test.go

+
+			initialToolsCount := len(resp.Tools)
+
+			filePath := operationsDir + "/main.graphql"


Using filepath.Join is best for this sort of thing, and can it be given a more descriptive name? This applies to all subtests.

endigma · 2025-07-10T15:48:59Z

router-tests/mcp_hot_reload_test.go

+			assert.EventuallyWithT(t, func(t *assert.CollectT) {
+
+				resp, err = xEnv.MCPClient.ListTools(xEnv.Context, toolsRequest)
+				assert.NoError(t, err)
+				assert.Len(t, resp.Tools, initialToolsCount)
+
+				// verity getEmployeeNotes operation tool is properly removed
+				require.NotContains(t, resp.Tools, mcp.Tool{
+					Name:        "execute_operation_get_employee_notes",
+					Description: "Executes the GraphQL operation 'getEmployeeNotes' of type query.",
+					InputSchema: mcp.ToolInputSchema{
+						Type:       "object",
+						Properties: map[string]interface{}{"id": map[string]interface{}{"type": "integer"}},
+						Required:   []string{"id"},
+					},
+					RawInputSchema: json.RawMessage(nil),
+					Annotations: mcp.ToolAnnotation{
+						Title:          "Execute operation getEmployeeNotes",
+						ReadOnlyHint:   mcp.ToBoolPtr(true),
+						IdempotentHint: mcp.ToBoolPtr(true),
+						OpenWorldHint:  mcp.ToBoolPtr(true),
+					},
+				})
+
+			}, 15*time.Second, 250*time.Millisecond)


Instead of using assert.Eventually... functions, we recently introduced an option to the watcher to directly provide a non-timing based tick source. You can check watcher_test.go for how to use it.

Sure, I'll check and update the test.

In case of mcp hot reload, we don't have direct access to the watcher config options, the watcher is defined in the loader.go file. So, in this particular scenario it is not possible to use non-timing based tick source.

endigma · 2025-07-10T15:58:45Z

router/pkg/watcher/watcher.go

+	listDirFilePaths := func() ([]string, error) {
+		var files []string
+		if options.Directory.DirPath != "" {
+			err := filepath.WalkDir(options.Directory.DirPath, func(path string, d fs.DirEntry, err error) error {
+				if err != nil {
+					return err
+				}
+				// Skip directories
+				if d.IsDir() {
+					return nil
+				}
+				// Skip if filter rejects the file
+				if options.Directory.Filter != nil && options.Directory.Filter(path) {
+					return nil
+				}
+				files = append(files, path)
+				return nil
+			})
+			if err != nil {
+				return []string{}, fmt.Errorf("error walking directory %s: %w", options.Directory.DirPath, err)
+			}
+		}
+		return files, nil
+	}


Can this be extracted and tested separately? Also, "filter" semantics usually return true when something should remain in the list, not be removed from it.

Okay, I'll create a util function outside New with unit test.

endigma · 2025-07-10T16:03:13Z

router/pkg/watcher/watcher.go

-	if len(options.Paths) == 0 {
-		return nil, errors.New("path must be provided")
+	if len(options.Paths) == 0 && options.Directory.DirPath == "" {
+		return nil, errors.New("paths or directory must be provided")


I think we should error if both Paths and Directory are provided, and I don't believe filter is necessary at the moment. This will allow you to deduplicate all the loops below and integrate Directories as just a source for a list of paths instead of separate handling.

Thanks for clarifying the required design. It would definitely prevent duplicate runs from out function.

endigma · 2025-07-10T16:05:45Z

router/pkg/watcher/watcher.go

 		}

 		prevModTimes := make(map[string]time.Time)
+		seenDirFilePaths := make(map[string]struct{})


What purpose does this serve? You maintain it as an index of the currently known file paths but don't use it for anything that prevModTimes can't be used for from what I can tell. We can already detect deleted files by looping over the prevModTimes map and comparing it with os.Stat result of the expected file. We already handle this by resetting the time so should it be created again it will be detected as an update.

I'll look into it.

melsonic · 2025-07-15T18:40:15Z

Hi @endigma, I have updated the PR incorporating your comments. However, I do have these following questions.

Should we throw an error in case both options.Paths & options.Directory is empty?
Since we are only allowing either Paths or Directory, in case the newly defined function ListDirFilePaths returns 0 file paths, should we also throw an error from our watcher function?

melsonic · 2025-07-15T18:45:54Z

router/pkg/watcher/watcher.go

 					}
 				}

+				if options.Directory.DirPath != "" {


I am reloading the updated directory files after going through the previously listed files. The will allow us to remove deleted files from prevModTimes in one loop. However there is a catch, in case of a new file addition, instead of 2 ticks, it will consume 3 ticks to run the provided Callback func.

On the other hand, in case we list the updated directory files before going through options.Paths, we'll have to run another loop on prevModTimes to remove keys for files, that are deleted from the directory.

Currently I have implemented the first approach, let me know if the team prefers the 2nd option.

I would personally choose option 2, looping over a map and updating it is not terribly slow and hot reloading is nowhere near a hot path. More intuitive or consistent reload behavior is more important than raw speed of a single tick handler

endigma · 2025-07-15T23:32:57Z

Hi @melsonic, I think having 0 paths to check is fine, provided we defensively prevent anything bad from happening in this case. This would allow gracefully handling a multi-tick wide total directory swap or something like a human speed drag-out -> drag-in operation.

Generally with watcher code we prefer resilience over all, e.g not failing out when we hit an invalid state and instead just falling back to a previous working state until some iteration is usable. In this case I believe having mcp enabled but with 0 operations should not be a failure case, but I'm not sure at the moment

endigma · 2025-07-15T23:34:06Z

And yes, watcher constructor should fail if the arguments themselves are both empty

melsonic · 2025-07-17T17:53:58Z

Hi @endigma, I am ready with my changes, please review it once you have time.

melsonic · 2025-07-25T12:24:03Z

Hi @endigma, Please review.

P.S: If you are busy with some other high priority task, please take your time. Commenting just in case you missed my previous review notification.

endigma · 2025-07-25T12:25:28Z

Hi @melsonic, this is low priority internally at the moment, but rest assured it's in the queue

melsonic · 2025-07-25T14:29:55Z

No problem @endigma, just wanted to know if it's in the queue or not.

github-actions · 2025-08-09T05:19:50Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

endigma · 2025-08-09T08:47:00Z

Unstale

github-actions · 2025-08-25T05:20:09Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

melsonic · 2025-08-25T17:39:19Z

unstale

endigma · 2025-09-03T12:29:26Z

Hi @melsonic, we've decided internally to adopt your work as the basis for our own fix for the original issue, as such, we'll be closing this PR for now. Thanks for your contribution!

feat: Support hot reloading of MCP operation files

4617ce2

github-actions bot added the router label Jun 12, 2025

added mcp hot reload tests

8df7113

added polling based watcher, removed inotify

6b9e161

melsonic mentioned this pull request Jun 21, 2025

feat: docs for mcp operations hot reloading wundergraph/cosmo-docs#107

Open

melsonic marked this pull request as ready for review June 21, 2025 14:44

endigma self-requested a review June 21, 2025 16:12

endigma requested changes Jun 23, 2025

View reviewed changes

melsonic added 2 commits June 25, 2025 23:42

fixed leaked goroutines

9ed9577

updated mcp hot reload tests + added shutdown test

dbcc024

melsonic requested a review from endigma June 25, 2025 18:17

endigma requested changes Jul 1, 2025

View reviewed changes

router/pkg/schemaloader/loader.go Outdated Show resolved Hide resolved

melsonic added 2 commits July 1, 2025 20:30

Merge branch 'main' into cosmo-issue-1886

3a5741c

added directory watching capability to cosmo watcher with filter

84912f3

melsonic requested review from Noroth, StarpTech and devsergiy as code owners July 4, 2025 19:56

coderabbitai bot reviewed Jul 4, 2025

View reviewed changes

fix coderabbitai comments

5a6a9a2

endigma self-requested a review July 10, 2025 15:42

endigma requested changes Jul 10, 2025

View reviewed changes

melsonic added 2 commits July 16, 2025 00:01

extracted list dir files func + removed redundant maps

01d0551

added new watcher tests

e84de46

melsonic requested a review from jensneuse as a code owner July 15, 2025 18:32

minor change

12d12df

melsonic commented Jul 15, 2025

View reviewed changes

melsonic requested a review from endigma July 15, 2025 18:51

melsonic and others added 3 commits July 17, 2025 00:22

fail watcher if both paths & dir are empty + consistent change detection

112a153

extracted mcp shutdown tests to lifecycle dir

d6ba838

Merge branch 'main' into cosmo-issue-1886

a4f512e

github-actions bot added the Stale label Aug 9, 2025

github-actions bot removed the Stale label Aug 10, 2025

github-actions bot added the Stale label Aug 25, 2025

github-actions bot removed the Stale label Aug 26, 2025

endigma closed this Sep 3, 2025

-					if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
-						return
-					}
+case <-s.mcpServer.ReloadOperationsChannel():
+    s.logger.Log(zap.InfoLevel, "Reloading mcp server!")
+    if mErr := s.mcpServer.Reload(ctx, executor.ClientSchema); mErr != nil {
+        s.logger.Error("Failed to reload MCP server", zap.Error(mErr))
+    }


		initialToolsCount := len(resp.Tools)

		filePath := operationsDir + "/main.graphql"

feat: Support hot reloading of MCP operation files #1959

feat: Support hot reloading of MCP operation files #1959

Conversation

melsonic commented Jun 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Checklist

Summary by CodeRabbit

Uh oh!

endigma commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melsonic commented Jun 17, 2025

Uh oh!

endigma commented Jun 18, 2025

Uh oh!

melsonic commented Jun 19, 2025

Uh oh!

endigma commented Jun 19, 2025

Uh oh!

melsonic commented Jun 21, 2025

Uh oh!

endigma commented Jun 21, 2025

Uh oh!

melsonic commented Jun 22, 2025

Uh oh!

endigma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

melsonic commented Jun 29, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

melsonic commented Jul 4, 2025

Uh oh!

melsonic commented Jul 10, 2025

Uh oh!

endigma commented Jul 10, 2025

Uh oh!

endigma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

melsonic commented Jun 12, 2025 •

edited by coderabbitai bot

Loading

endigma commented Jun 17, 2025 •

edited

Loading

coderabbitai bot commented Jul 4, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Jul 4, 2025 •

edited

Loading

melsonic commented Jul 17, 2025 •

edited

Loading