-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Hi there, I built yutu — a YouTube CLI powered by cobra. Over time, I extended it into an MCP server and an AI agent, all within the same binary. The mcp and agent modes are just cobra subcommands:
> yutu
Available Commands:
agent Start agent to automate YouTube workflows
mcp Start MCP server
video Manipulate YouTube videos
playlist Manipulate YouTube playlists
...Three libraries make this work:
| Layer | Library |
|---|---|
| CLI | cobra |
| MCP | modelcontextprotocol/go-sdk |
| Agent | adk-go |
The point of this post: any cobra application can become an MCP server and an AI agent with minimal glue code — and the main friction point is flag/schema duplication between cobra and MCP.
Architecture
The key insight is that all three interfaces share the same domain logic. Only the "input layer" differs:
+--- CLI Flags ------ cobra ------+
main.go -> cmd/ ---+ +---> pkg/<resource>/
+--- MCP Schema ---- go-sdk ------+
| |
+--- Agent --------- adk-go ------+
(reuses MCP tools via in-memory transport)
pkg/<resource>/: Pure domain logic. Each resource (video, channel, playlist, ...) exposes methods likeList(),Insert(),Update(),Delete()operating on a struct with functional options.cmd/<resource>/: Registers both cobra subcommands and MCP tools, calling the samepkg/methods.cmd/agent/: The agent connects to the MCP server via an in-memory transport and reuses all registered MCP tools - zero additional wiring per resource.
Step 1: Domain logic in pkg/
Each resource is a self-contained package with a struct, functional options, and methods:
// pkg/activity/activity.go
type Activity struct {
ChannelId string `json:"channel_id,omitempty"`
MaxResults int64 `json:"max_results,omitempty"`
// ...
}
func NewActivity(opts ...Option) IActivity[youtube.Activity] { /* ... */ }
func (a *Activity) List(writer io.Writer) error { /* ... */ }Step 2: CLI + MCP in cmd/
Each resource's init() registers both a cobra command and an MCP tool side by side, sharing usage strings:
// cmd/activity/list.go
func init() {
// MCP tool registration
mcp.AddTool(cmd.Server, &mcp.Tool{
Name: "activity-list", InputSchema: listInSchema,
}, cmd.GenToolHandler("activity-list",
func(input activity.Activity, writer io.Writer) error {
return input.List(writer)
},
))
// Cobra flag registration
activityCmd.AddCommand(listCmd)
listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
// ...
}The MCP tool handler is generic - a single GenToolHandler[T] function handles JSON deserialization into the domain struct and writes the result:
// cmd/handler.go
func GenToolHandler[T any](
toolName string, op func(T, io.Writer) error,
) mcp.ToolHandlerFor[T, any] { /* ... */ }Step 3: Agent reuses MCP tools
The agent doesn't need to know about individual resources at all. It connects to the same MCP server via an in-memory transport and gets all tools for free:
// cmd/agent/agent.go
clientTransport, serverTransport := mcp.NewInMemoryTransports()
cmd.Server.Connect(ctx, serverTransport, nil)
mcpToolSet, _ := mcptoolset.New(mcptoolset.Config{
Transport: clientTransport,
})This means adding a new YouTube resource to the CLI automatically makes it available as an MCP tool and an agent capability, with one registration in cmd/<resource>/.
The agent itself uses a multi-agent architecture (orchestrator + retrieval/modifier/destroyer sub-agents), with each sub-agent receiving a filtered subset of MCP tools:
tool.FilterToolset(mcpToolSet, tool.StringPredicate(def.toolNames))The Duplication Problem
However, there is some code duplication. The main duplication comes from the input definition: flags for cobra, schema for MCP. Here is an example:
MCP Schema:
var listInSchema = &jsonschema.Schema{
Type: "object",
Properties: map[string]*jsonschema.Schema{
"channel_id": {Type: "string", Description: ciUsage},
"max_results": {Type: "number", Description: pkg.MRUsage, Default: json.RawMessage("5")},
"mine": {Type: "boolean", Description: mineUsage},
// ...
},
}Cobra Flags:
listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
listCmd.Flags().BoolVarP(mine, "mine", "M", true, mineUsage)They share descriptions (ciUsage, pkg.MRUsage) but everything else is defined twice.
Bridging the Gap Today
Cobra and pflag already provide building blocks that get us partway there. The pflag.Flag struct exposes:
type Flag struct {
Name string
Shorthand string
Usage string // → MCP description
Value Value // .Type() → MCP type, .String() → MCP default
DefValue string // → MCP default
Annotations map[string][]string // extensible metadata
// ...
}And cobra adds higher-level APIs on top:
MarkFlagRequired— sets an annotation (BashCompOneRequiredFlag) → maps to MCPRequiredRegisterFlagCompletionFunc— provides valid values for shell completion → conceptually maps to MCPEnumVisitAll— iterates every flag in a command
So in theory, you could write a converter that walks a cobra command's flags and generates an MCP schema automatically:
func SchemaFromCmd(cmd *cobra.Command) *jsonschema.Schema {
schema := &jsonschema.Schema{Type: "object", Properties: map[string]*jsonschema.Schema{}}
cmd.Flags().VisitAll(func(f *pflag.Flag) {
prop := &jsonschema.Schema{
Description: f.Usage,
Default: json.RawMessage(quoteDefault(f)),
}
switch f.Value.Type() {
case "string":
prop.Type = "string"
case "int", "int64", "float64":
prop.Type = "number"
case "bool":
prop.Type = "boolean"
case "stringSlice":
prop.Type = "array"
prop.Items = &jsonschema.Schema{Type: "string"}
}
// MarkFlagRequired stores an annotation we can read back
if ann, ok := f.Annotations["cobra_annotation_bash_completion_one_required_flag"]; ok && ann[0] == "true" {
schema.Required = append(schema.Required, f.Name)
}
schema.Properties[f.Name] = prop
})
return schema
}This covers type, default, description, and required — the overlapping subset. But the remaining MCP-only features (Enum, Minimum/Maximum, Items constraints) have no cobra equivalent to read from.
What's Missing
The gap is narrow but real:
| MCP Schema Feature | Cobra/pflag Equivalent | Status |
|---|---|---|
type |
Flag.Value.Type() |
Available |
description |
Flag.Usage |
Available |
default |
Flag.DefValue |
Available |
required |
MarkFlagRequired annotation |
Available (read back via Flag.Annotations) |
enum |
RegisterFlagCompletionFunc |
Partial — completion funcs aren't introspectable as a static value list |
minimum/maximum |
— | Not available |
The closest cobra has to Enum is RegisterFlagCompletionFunc, but it registers a function (for dynamic completion), not a static list of valid values. There's no way to read back "this flag accepts only these values" as data.
Possible Directions
Two lightweight options that could close the gap without changing cobra's core:
Option A: Convention over Annotations
pflag's Annotations map[string][]string is already extensible. A community convention (or thin helper library) could encode MCP-relevant metadata:
flags.SetAnnotation("privacy", "enum", []string{"public", "private", "unlisted"})
flags.SetAnnotation("maxResults", "minimum", []string{"0"})
flags.SetAnnotation("maxResults", "maximum", []string{"50"})The schema converter above would then pick these up. No cobra changes needed — just a convention.
Option B: First-class Enum / ValidValues on pflag
A more ergonomic approach: if pflag's Flag struct gained a ValidValues []string field (or cobra added a MarkFlagEnum method alongside MarkFlagRequired), the same data would serve shell completion, validation, and schema generation:
// Hypothetical
cmd.MarkFlagEnum("privacy", "public", "private", "unlisted")
// Internally: sets Flag.ValidValues + registers completion func + sets annotationThis would unify three things that are currently separate: completion, validation, and schema metadata.
Takeaways
- Cobra + MCP is natural:
yutu mcpis just another subcommand. The MCP server is a globalvar Serverinitialized at the package level, and each resource'sinit()registers tools. - Agent for free: By connecting the agent to the MCP server via in-memory transport, you get all tools without per-resource wiring.
- Shared domain logic: The
pkg/layer is completely interface-agnostic. CLI, MCP, and agent all call the same methods. - Most flag metadata is already recoverable from pflag's
Flagstruct + cobra annotations. A simpleVisitAllloop can generate ~80% of an MCP schema today. - The remaining gap is enum values and numeric bounds. A lightweight
Annotationsconvention — or a newMarkFlagEnumAPI — would close it.
I'd love to hear thoughts from the cobra community — has anyone else extended their CLI into an MCP server or agent? Would an Annotations-based convention or a MarkFlagEnum API be useful?