Extending a Cobra CLI into an MCP Server and AI Agent — architecture, patterns, and the flag/schema gap

Hi there, I built [yutu](https://github.com/eat-pray-ai/yutu) — a YouTube CLI powered by cobra. Over time, I extended it into an [MCP](https://modelcontextprotocol.io/) server and an AI agent, all within the same binary. The `mcp` and `agent` modes are just cobra subcommands:

```shell
> yutu
Available Commands:
  agent       Start agent to automate YouTube workflows
  mcp         Start MCP server
  video       Manipulate YouTube videos
  playlist    Manipulate YouTube playlists
  ...
```

Three libraries make this work:

| Layer | Library |
|-------|---------|
| CLI   | [cobra](https://github.com/spf13/cobra) |
| MCP   | [modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) |
| Agent | [adk-go](https://github.com/google/adk-go) |

The point of this post: **any cobra application can become an MCP server and an AI agent with minimal glue code** — and the main friction point is flag/schema duplication between cobra and MCP.

## Architecture

The key insight is that **all three interfaces share the same domain logic**. Only the "input layer" differs:

```
                    +--- CLI Flags ------ cobra ------+
main.go -> cmd/ ---+                                  +---> pkg/<resource>/
                    +--- MCP Schema ---- go-sdk ------+
                    |                                 |
                    +--- Agent --------- adk-go ------+
                          (reuses MCP tools via in-memory transport)
```

- **`pkg/<resource>/`**: Pure domain logic. Each resource (video, channel, playlist, ...) exposes methods like `List()`, `Insert()`, `Update()`, `Delete()` operating on a struct with functional options.
- **`cmd/<resource>/`**: Registers both cobra subcommands and MCP tools, calling the same `pkg/` methods.
- **`cmd/agent/`**: The agent connects to the MCP server via an in-memory transport and reuses all registered MCP tools - zero additional wiring per resource.

### Step 1: Domain logic in `pkg/`

Each resource is a self-contained package with a struct, functional options, and methods:

```go
// pkg/activity/activity.go
type Activity struct {
    ChannelId  string `json:"channel_id,omitempty"`
    MaxResults int64  `json:"max_results,omitempty"`
    // ...
}

func NewActivity(opts ...Option) IActivity[youtube.Activity] { /* ... */ }
func (a *Activity) List(writer io.Writer) error { /* ... */ }
```

### Step 2: CLI + MCP in `cmd/`

Each resource's `init()` registers both a cobra command and an MCP tool side by side, sharing usage strings:

```go
// cmd/activity/list.go
func init() {
    // MCP tool registration
    mcp.AddTool(cmd.Server, &mcp.Tool{
        Name: "activity-list", InputSchema: listInSchema,
    }, cmd.GenToolHandler("activity-list",
        func(input activity.Activity, writer io.Writer) error {
            return input.List(writer)
        },
    ))

    // Cobra flag registration
    activityCmd.AddCommand(listCmd)
    listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
    listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
    // ...
}
```

The MCP tool handler is generic - a single `GenToolHandler[T]` function handles JSON deserialization into the domain struct and writes the result:

```go
// cmd/handler.go
func GenToolHandler[T any](
    toolName string, op func(T, io.Writer) error,
) mcp.ToolHandlerFor[T, any] { /* ... */ }
```

### Step 3: Agent reuses MCP tools

The agent doesn't need to know about individual resources at all. It connects to the same MCP server via an **in-memory transport** and gets all tools for free:

```go
// cmd/agent/agent.go
clientTransport, serverTransport := mcp.NewInMemoryTransports()
cmd.Server.Connect(ctx, serverTransport, nil)

mcpToolSet, _ := mcptoolset.New(mcptoolset.Config{
    Transport: clientTransport,
})
```

This means adding a new YouTube resource to the CLI automatically makes it available as an MCP tool *and* an agent capability, with **one registration** in `cmd/<resource>/`.

The agent itself uses a multi-agent architecture (orchestrator + retrieval/modifier/destroyer sub-agents), with each sub-agent receiving a filtered subset of MCP tools:

```go
tool.FilterToolset(mcpToolSet, tool.StringPredicate(def.toolNames))
```

## The Duplication Problem

However, there is some code duplication. The main duplication comes from the **input definition**: flags for cobra, schema for MCP. Here is an [example](https://github.com/eat-pray-ai/yutu/blob/main/cmd/activity/list.go):

**MCP Schema:**
```go
var listInSchema = &jsonschema.Schema{
    Type:     "object",
    Properties: map[string]*jsonschema.Schema{
        "channel_id":  {Type: "string", Description: ciUsage},
        "max_results": {Type: "number", Description: pkg.MRUsage, Default: json.RawMessage("5")},
        "mine":        {Type: "boolean", Description: mineUsage},
        // ...
    },
}
```

**Cobra Flags:**
```go
listCmd.Flags().StringVarP(&channelId, "channelId", "c", "", ciUsage)
listCmd.Flags().Int64VarP(&maxResults, "maxResults", "n", 5, pkg.MRUsage)
listCmd.Flags().BoolVarP(mine, "mine", "M", true, mineUsage)
```

They share descriptions (`ciUsage`, `pkg.MRUsage`) but everything else is defined twice.

### Bridging the Gap Today

Cobra and pflag already provide building blocks that get us partway there. The `pflag.Flag` struct exposes:

```go
type Flag struct {
    Name        string
    Shorthand   string
    Usage       string              // → MCP description
    Value       Value               // .Type() → MCP type, .String() → MCP default
    DefValue    string              // → MCP default
    Annotations map[string][]string // extensible metadata
    // ...
}
```

And cobra adds higher-level APIs on top:

- **`MarkFlagRequired`** — sets an annotation (`BashCompOneRequiredFlag`) → maps to MCP `Required`
- **`RegisterFlagCompletionFunc`** — provides valid values for shell completion → conceptually maps to MCP `Enum`
- **`VisitAll`** — iterates every flag in a command

So in theory, you could write a converter that walks a cobra command's flags and generates an MCP schema automatically:

```go
func SchemaFromCmd(cmd *cobra.Command) *jsonschema.Schema {
    schema := &jsonschema.Schema{Type: "object", Properties: map[string]*jsonschema.Schema{}}
    cmd.Flags().VisitAll(func(f *pflag.Flag) {
        prop := &jsonschema.Schema{
            Description: f.Usage,
            Default:     json.RawMessage(quoteDefault(f)),
        }
        switch f.Value.Type() {
        case "string":
            prop.Type = "string"
        case "int", "int64", "float64":
            prop.Type = "number"
        case "bool":
            prop.Type = "boolean"
        case "stringSlice":
            prop.Type = "array"
            prop.Items = &jsonschema.Schema{Type: "string"}
        }
        // MarkFlagRequired stores an annotation we can read back
        if ann, ok := f.Annotations["cobra_annotation_bash_completion_one_required_flag"]; ok && ann[0] == "true" {
            schema.Required = append(schema.Required, f.Name)
        }
        schema.Properties[f.Name] = prop
    })
    return schema
}
```

This covers **type, default, description, and required** — the overlapping subset. But the remaining MCP-only features (`Enum`, `Minimum`/`Maximum`, `Items` constraints) have no cobra equivalent to read from.

### What's Missing

The gap is narrow but real:

| MCP Schema Feature | Cobra/pflag Equivalent | Status |
|--------------------|------------------------|--------|
| `type`             | `Flag.Value.Type()`    | Available |
| `description`      | `Flag.Usage`           | Available |
| `default`          | `Flag.DefValue`        | Available |
| `required`         | `MarkFlagRequired` annotation | Available (read back via `Flag.Annotations`) |
| `enum`             | `RegisterFlagCompletionFunc` | Partial — completion funcs aren't introspectable as a static value list |
| `minimum`/`maximum`| —                      | Not available |

The closest cobra has to `Enum` is `RegisterFlagCompletionFunc`, but it registers a *function* (for dynamic completion), not a static list of valid values. There's no way to read back "this flag accepts only these values" as data.

### Possible Directions

Two lightweight options that could close the gap without changing cobra's core:

**Option A: Convention over `Annotations`**

pflag's `Annotations map[string][]string` is already extensible. A community convention (or thin helper library) could encode MCP-relevant metadata:

```go
flags.SetAnnotation("privacy", "enum", []string{"public", "private", "unlisted"})
flags.SetAnnotation("maxResults", "minimum", []string{"0"})
flags.SetAnnotation("maxResults", "maximum", []string{"50"})
```

The schema converter above would then pick these up. No cobra changes needed — just a convention.

**Option B: First-class `Enum` / `ValidValues` on pflag**

A more ergonomic approach: if pflag's `Flag` struct gained a `ValidValues []string` field (or cobra added a `MarkFlagEnum` method alongside `MarkFlagRequired`), the same data would serve shell completion, validation, and schema generation:

```go
// Hypothetical
cmd.MarkFlagEnum("privacy", "public", "private", "unlisted")
// Internally: sets Flag.ValidValues + registers completion func + sets annotation
```

This would unify three things that are currently separate: completion, validation, and schema metadata.

## Takeaways

1. **Cobra + MCP is natural**: `yutu mcp` is just another subcommand. The MCP server is a global `var Server` initialized at the package level, and each resource's `init()` registers tools.
2. **Agent for free**: By connecting the agent to the MCP server via in-memory transport, you get all tools without per-resource wiring.
3. **Shared domain logic**: The `pkg/` layer is completely interface-agnostic. CLI, MCP, and agent all call the same methods.
4. **Most flag metadata is already recoverable** from pflag's `Flag` struct + cobra annotations. A simple `VisitAll` loop can generate ~80% of an MCP schema today.
5. **The remaining gap** is enum values and numeric bounds. A lightweight `Annotations` convention — or a new `MarkFlagEnum` API — would close it.

I'd love to hear thoughts from the cobra community — has anyone else extended their CLI into an MCP server or agent? Would an `Annotations`-based convention or a `MarkFlagEnum` API be useful?


MCP Schema Feature	Cobra/pflag Equivalent	Status
`type`	`Flag.Value.Type()`	Available
`description`	`Flag.Usage`	Available
`default`	`Flag.DefValue`	Available
`required`	`MarkFlagRequired` annotation	Available (read back via `Flag.Annotations`)
`enum`	`RegisterFlagCompletionFunc`	Partial — completion funcs aren't introspectable as a static value list
`minimum`/`maximum`	—	Not available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending a Cobra CLI into an MCP Server and AI Agent — architecture, patterns, and the flag/schema gap #2362

Architecture

Step 1: Domain logic in `pkg/`

Step 2: CLI + MCP in `cmd/`

Step 3: Agent reuses MCP tools

The Duplication Problem

Bridging the Gap Today

What's Missing

Possible Directions

Takeaways

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Extending a Cobra CLI into an MCP Server and AI Agent — architecture, patterns, and the flag/schema gap #2362

Description

Architecture

Step 1: Domain logic in pkg/

Step 2: CLI + MCP in cmd/

Step 3: Agent reuses MCP tools

The Duplication Problem

Bridging the Gap Today

What's Missing

Possible Directions

Takeaways

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Step 1: Domain logic in `pkg/`

Step 2: CLI + MCP in `cmd/`