Skip to content

Extending a Cobra CLI into an MCP Server and AI Agent — architecture, patterns, and the flag/schema gap #2362

@OpenWaygate

Description

@OpenWaygate

Hi there, I built yutu — a YouTube CLI powered by cobra. Over time, I extended it into an MCP server and an AI agent, all within the same binary. The mcp and agent modes are just cobra subcommands:

> yutu
Available Commands:
  agent       Start agent to automate YouTube workflows
  mcp         Start MCP server
  video       Manipulate YouTube videos
  playlist    Manipulate YouTube playlists
  ...

Three libraries make this work:

Layer Library
CLI cobra
MCP modelcontextprotocol/go-sdk
Agent adk-go

The point of this post: any cobra application can become an MCP server and an AI agent with minimal glue code — and the main friction point is flag/schema duplication between cobra and MCP.

Architecture

The key insight is that all three interfaces share the same domain logic. Only the "input layer" differs:

                    +--- CLI Flags ------ cobra ------+
main.go -> cmd/ ---+                                  +---> pkg/<resource>/
                    +--- MCP Schema ---- go-sdk ------+
                    |                                 |
                    +--- Agent --------- adk-go ------+
                          (reuses MCP tools via in-memory transport)
  • pkg/<resource>/: Pure domain logic. Each resource (video, channel, playlist, ...) exposes methods like List(), Insert(), Update(), Delete() operating on a struct with functional options.
  • cmd/<resource>/: Registers both cobra subcommands and MCP tools, calling the same pkg/ methods.
  • cmd/agent/: The agent connects to the MCP server via an in-memory transport and reuses all registered MCP tools - zero additional wiring per resource.

Step 1: Domain logic in pkg/

Each resource is a self-contained package with a struct, functional options, and methods:

// pkg/activity/activity.go
type Activity struct {
    ChannelId  string `json:"channel_id,omitempty"`
    MaxResults int64  `json:"max_results,omitempty"`
    // ...
}

func NewActivity(opts ...Option) IActivity[youtube.Activity] { /* ... */ }
func (a *Activity) List(writer io.Writer) error { /* ... */ }

Step 2: CLI + MCP in cmd/

Each resource's init() registers both a cobra command and an MCP tool side by side, sharing usage strings:

// cmd/activity/list.go
func init() {
    // MCP tool registration
    mcp.AddTool(cmd.Server, &mcp.Tool{
        Name: "activity-list", InputSchema: listInSchema,
    }, cmd.GenToolHandler("activity-list",
        func(input activity.Activity, writer io.Writer) error {
            return input.List(writer)
        },
    ))

    // Cobra flag registration
    activityCmd.AddCommand(listCmd)
    listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
    listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
    // ...
}

The MCP tool handler is generic - a single GenToolHandler[T] function handles JSON deserialization into the domain struct and writes the result:

// cmd/handler.go
func GenToolHandler[T any](
    toolName string, op func(T, io.Writer) error,
) mcp.ToolHandlerFor[T, any] { /* ... */ }

Step 3: Agent reuses MCP tools

The agent doesn't need to know about individual resources at all. It connects to the same MCP server via an in-memory transport and gets all tools for free:

// cmd/agent/agent.go
clientTransport, serverTransport := mcp.NewInMemoryTransports()
cmd.Server.Connect(ctx, serverTransport, nil)

mcpToolSet, _ := mcptoolset.New(mcptoolset.Config{
    Transport: clientTransport,
})

This means adding a new YouTube resource to the CLI automatically makes it available as an MCP tool and an agent capability, with one registration in cmd/<resource>/.

The agent itself uses a multi-agent architecture (orchestrator + retrieval/modifier/destroyer sub-agents), with each sub-agent receiving a filtered subset of MCP tools:

tool.FilterToolset(mcpToolSet, tool.StringPredicate(def.toolNames))

The Duplication Problem

However, there is some code duplication. The main duplication comes from the input definition: flags for cobra, schema for MCP. Here is an example:

MCP Schema:

var listInSchema = &jsonschema.Schema{
    Type:     "object",
    Properties: map[string]*jsonschema.Schema{
        "channel_id":  {Type: "string", Description: ciUsage},
        "max_results": {Type: "number", Description: pkg.MRUsage, Default: json.RawMessage("5")},
        "mine":        {Type: "boolean", Description: mineUsage},
        // ...
    },
}

Cobra Flags:

listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
listCmd.Flags().BoolVarP(mine, "mine", "M", true, mineUsage)

They share descriptions (ciUsage, pkg.MRUsage) but everything else is defined twice.

Bridging the Gap Today

Cobra and pflag already provide building blocks that get us partway there. The pflag.Flag struct exposes:

type Flag struct {
    Name        string
    Shorthand   string
    Usage       string              // → MCP description
    Value       Value               // .Type() → MCP type, .String() → MCP default
    DefValue    string              // → MCP default
    Annotations map[string][]string // extensible metadata
    // ...
}

And cobra adds higher-level APIs on top:

  • MarkFlagRequired — sets an annotation (BashCompOneRequiredFlag) → maps to MCP Required
  • RegisterFlagCompletionFunc — provides valid values for shell completion → conceptually maps to MCP Enum
  • VisitAll — iterates every flag in a command

So in theory, you could write a converter that walks a cobra command's flags and generates an MCP schema automatically:

func SchemaFromCmd(cmd *cobra.Command) *jsonschema.Schema {
    schema := &jsonschema.Schema{Type: "object", Properties: map[string]*jsonschema.Schema{}}
    cmd.Flags().VisitAll(func(f *pflag.Flag) {
        prop := &jsonschema.Schema{
            Description: f.Usage,
            Default:     json.RawMessage(quoteDefault(f)),
        }
        switch f.Value.Type() {
        case "string":
            prop.Type = "string"
        case "int", "int64", "float64":
            prop.Type = "number"
        case "bool":
            prop.Type = "boolean"
        case "stringSlice":
            prop.Type = "array"
            prop.Items = &jsonschema.Schema{Type: "string"}
        }
        // MarkFlagRequired stores an annotation we can read back
        if ann, ok := f.Annotations["cobra_annotation_bash_completion_one_required_flag"]; ok && ann[0] == "true" {
            schema.Required = append(schema.Required, f.Name)
        }
        schema.Properties[f.Name] = prop
    })
    return schema
}

This covers type, default, description, and required — the overlapping subset. But the remaining MCP-only features (Enum, Minimum/Maximum, Items constraints) have no cobra equivalent to read from.

What's Missing

The gap is narrow but real:

MCP Schema Feature Cobra/pflag Equivalent Status
type Flag.Value.Type() Available
description Flag.Usage Available
default Flag.DefValue Available
required MarkFlagRequired annotation Available (read back via Flag.Annotations)
enum RegisterFlagCompletionFunc Partial — completion funcs aren't introspectable as a static value list
minimum/maximum Not available

The closest cobra has to Enum is RegisterFlagCompletionFunc, but it registers a function (for dynamic completion), not a static list of valid values. There's no way to read back "this flag accepts only these values" as data.

Possible Directions

Two lightweight options that could close the gap without changing cobra's core:

Option A: Convention over Annotations

pflag's Annotations map[string][]string is already extensible. A community convention (or thin helper library) could encode MCP-relevant metadata:

flags.SetAnnotation("privacy", "enum", []string{"public", "private", "unlisted"})
flags.SetAnnotation("maxResults", "minimum", []string{"0"})
flags.SetAnnotation("maxResults", "maximum", []string{"50"})

The schema converter above would then pick these up. No cobra changes needed — just a convention.

Option B: First-class Enum / ValidValues on pflag

A more ergonomic approach: if pflag's Flag struct gained a ValidValues []string field (or cobra added a MarkFlagEnum method alongside MarkFlagRequired), the same data would serve shell completion, validation, and schema generation:

// Hypothetical
cmd.MarkFlagEnum("privacy", "public", "private", "unlisted")
// Internally: sets Flag.ValidValues + registers completion func + sets annotation

This would unify three things that are currently separate: completion, validation, and schema metadata.

Takeaways

  1. Cobra + MCP is natural: yutu mcp is just another subcommand. The MCP server is a global var Server initialized at the package level, and each resource's init() registers tools.
  2. Agent for free: By connecting the agent to the MCP server via in-memory transport, you get all tools without per-resource wiring.
  3. Shared domain logic: The pkg/ layer is completely interface-agnostic. CLI, MCP, and agent all call the same methods.
  4. Most flag metadata is already recoverable from pflag's Flag struct + cobra annotations. A simple VisitAll loop can generate ~80% of an MCP schema today.
  5. The remaining gap is enum values and numeric bounds. A lightweight Annotations convention — or a new MarkFlagEnum API — would close it.

I'd love to hear thoughts from the cobra community — has anyone else extended their CLI into an MCP server or agent? Would an Annotations-based convention or a MarkFlagEnum API be useful?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions