feat(sys): add operational observability tools (leases, metrics, replication, cluster health, host info)#86
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a set of new “sys” operational observability MCP tools to the Vault MCP server, and registers them in the tool initializer to support diagnosing cluster/replication health and inspecting leases/host/metrics.
Changes:
- Register new sys observability tools in
InitTools. - Add tools to read replication status, cluster health, telemetry metrics, and host info.
- Add tools to list leases and look up an individual lease.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tools/tools.go | Registers the newly added sys observability tools with the MCP server. |
| pkg/tools/sys/read_replication_status.go | Adds a tool to fetch and format replication status for performance/DR replication. |
| pkg/tools/sys/read_cluster_health.go | Adds a tool aggregating HA status, Raft autopilot state/config, and seal backend status. |
| pkg/tools/sys/read_metrics.go | Adds a tool intended to read Vault telemetry metrics. |
| pkg/tools/sys/read_host_info.go | Adds a tool to read Vault host diagnostics info. |
| pkg/tools/sys/list_leases.go | Adds a tool to list leases under an optional prefix. |
| pkg/tools/sys/read_lease.go | Adds a tool to look up an individual lease’s details by lease ID. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Query the sys/metrics endpoint (returns JSON by default) | ||
| secret, err := vault.Logical().Read("sys/metrics") | ||
| if err != nil { | ||
| logger.WithError(err).Error("Failed to read metrics") | ||
| return mcp.NewToolResultError(fmt.Sprintf("Failed to read metrics: %v", err)), nil |
There was a problem hiding this comment.
sys/metrics does not reliably return a JSON body; it commonly returns Prometheus/plain-text unless a format=json query parameter is provided. Using vault.Logical().Read("sys/metrics") will fail JSON decoding in those cases, making this tool unusable on many Vault configurations. Consider switching to Logical().ReadWithData (or a raw request) and explicitly requesting JSON (and/or allow a format argument so callers can choose JSON vs Prometheus).
| // No leases at this path | ||
| return mcp.NewToolResultText("No leases found at this path"), nil |
There was a problem hiding this comment.
This "no leases" branch returns a human string, while other list tools in this repo return JSON (e.g., list_secrets returns []). Returning non-JSON here makes the output harder to consume programmatically and inconsistent with existing tool behavior. Consider returning a JSON empty list (or { "keys": [] }) instead.
| // No leases at this path | |
| return mcp.NewToolResultText("No leases found at this path"), nil | |
| // No leases at this path; return an empty JSON array for consistency with other list tools | |
| return mcp.NewToolResultText("[]"), nil |
| // Format the response | ||
| jsonData, err := json.MarshalIndent(secret.Data, "", " ") |
There was a problem hiding this comment.
The list response is marshaled as the full secret.Data map. Other list tools typically extract and return just the key list (e.g., secret.Data["keys"]) as JSON. Consider extracting the keys field and returning a JSON array for consistency with list_secrets/list_pki_roles patterns.
| // Format the response | |
| jsonData, err := json.MarshalIndent(secret.Data, "", " ") | |
| // Format the response: extract and return only the "keys" field for consistency | |
| keysVal, ok := secret.Data["keys"] | |
| if !ok { | |
| // No "keys" field; return an empty list to keep the response shape consistent | |
| keysVal = []string{} | |
| } | |
| // Normalize keysVal into something JSON-marshalable, preferring []string | |
| var ( | |
| marshalTarget interface{} | |
| keysSlice []string | |
| ) | |
| switch v := keysVal.(type) { | |
| case []string: | |
| marshalTarget = v | |
| case []interface{}: | |
| keysSlice = make([]string, 0, len(v)) | |
| for _, item := range v { | |
| keysSlice = append(keysSlice, fmt.Sprint(item)) | |
| } | |
| marshalTarget = keysSlice | |
| default: | |
| // Fallback: wrap single value or unexpected type into a one-element string slice | |
| if v != nil { | |
| marshalTarget = []string{fmt.Sprint(v)} | |
| } else { | |
| marshalTarget = []string{} | |
| } | |
| } | |
| jsonData, err := json.MarshalIndent(marshalTarget, "", " ") |
| if lastWAL, ok := data["last_wal"].(float64); ok { | ||
| status.LastWAL = uint64(lastWAL) | ||
| } | ||
|
|
||
| if merkleRoot, ok := data["merkle_root"].(string); ok { | ||
| status.MerkleRoot = merkleRoot | ||
| } | ||
|
|
||
| if lastRemoteWAL, ok := data["last_remote_wal"].(float64); ok { | ||
| status.LastRemoteWAL = uint64(lastRemoteWAL) | ||
| } |
There was a problem hiding this comment.
Casting JSON numbers from float64 to uint64 can silently lose precision once values exceed 2^53. WAL indexes can grow large enough for this to matter, producing incorrect output. Consider preserving these fields as strings/json.Number, or decoding the API response with UseNumber and converting safely.
| if lastTerm, ok := data["last_term"].(float64); ok { | ||
| status.LastTerm = uint64(lastTerm) | ||
| } | ||
| if lastIndex, ok := data["last_index"].(float64); ok { | ||
| status.LastIndex = uint64(lastIndex) | ||
| } |
There was a problem hiding this comment.
These fields are parsed as float64 and then cast to uint64, which can lose precision for large Raft indexes/terms. Consider decoding using json.Number (or leaving them as strings) to avoid inaccurate health output on long-running clusters.
| // Capture any additional details | ||
| info.Details = make(map[string]interface{}) | ||
| for key, value := range data { | ||
| if key != "type" && key != "healthy" { | ||
| info.Details[key] = value | ||
| } |
There was a problem hiding this comment.
SealBackendStatusInfo.Details is tagged omitempty, but this code always initializes it to an empty map, so the JSON output will include "details": {} even when there are no extra fields. Consider only allocating/setting Details when at least one non-standard key is present (or leave it nil when empty).
| // Capture any additional details | |
| info.Details = make(map[string]interface{}) | |
| for key, value := range data { | |
| if key != "type" && key != "healthy" { | |
| info.Details[key] = value | |
| } | |
| // Capture any additional details, but only allocate the map if needed | |
| for key, value := range data { | |
| if key == "type" || key == "healthy" { | |
| continue | |
| } | |
| if info.Details == nil { | |
| info.Details = make(map[string]interface{}) | |
| } | |
| info.Details[key] = value |
Adds new tools:
Registers all tools.