Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions plugins/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,31 @@ Anonymous usage metrics collection for ai-helpers slash commands, skills, and se
The `metrics` plugin provides anonymous usage tracking for:
- **Events**: Individual slash commands and skill invocations
- **Sessions**: Aggregate session-level metrics (duration, tool usage, conversation patterns)
- **AI Docs Usage**: Track how agentic documentation is used during development

This helps maintainers understand usage patterns and make data-driven decisions about feature development and improvements.

## Commands

### `/metrics:ai-docs-telemetry`

Analyze Claude Code session logs to track ai-docs usage patterns. See [ai-docs-telemetry.md](commands/ai-docs-telemetry.md) for full documentation.

**Quick examples:**
```bash
# Scan all recent sessions
/metrics:ai-docs-telemetry -scan

# Scan only enhancements repo
/metrics:ai-docs-telemetry -scan -project enhancements

# Analyze specific session
/metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl

# Pipe to jq for analysis
/metrics:ai-docs-telemetry -scan | jq -r '.[] | "\(.documentation.entry_point): \(.documentation.total_files)"'
```

## How It Works

The plugin uses Claude Code's [hook system](https://docs.claude.com/en/docs/claude-code/hooks) to automatically track usage:
Expand Down Expand Up @@ -305,7 +327,9 @@ All metrics collection logic is open source and available in this repository:
- **Hook definition**: `plugins/metrics/hooks/hooks.json`
- **Event collection script**: `plugins/metrics/scripts/send_metrics.py`
- **Session collection script**: `plugins/metrics/scripts/send_session_metrics.py`
- **AI docs telemetry script**: `plugins/metrics/scripts/ai_docs_telemetry.py`
- **Plugin metadata**: `plugins/metrics/.claude-plugin/plugin.json`
- **Commands**: `plugins/metrics/commands/`

## Data Usage

Expand Down
97 changes: 97 additions & 0 deletions plugins/metrics/commands/ai-docs-telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
description: Analyze Claude Code session logs for ai-docs usage patterns
argument-hint: "[-scan] [-project <name>] [-session <path>]"
---

## Name
metrics:ai-docs-telemetry

## Synopsis
```
/metrics:ai-docs-telemetry -scan [-project <name>]
/metrics:ai-docs-telemetry -session <path-to-session.jsonl>
```
Comment on lines +10 to +13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

Several fenced blocks are missing a language tag, which triggers markdownlint MD040 and can fail/pollute docs CI.

🛠️ Suggested fix
-```
+```bash
 /metrics:ai-docs-telemetry -scan [-project <name>]
 /metrics:ai-docs-telemetry -session <path-to-session.jsonl>
  • /metrics:ai-docs-telemetry -scan
  • /metrics:ai-docs-telemetry -scan -project enhancements
  • /metrics:ai-docs-telemetry -scan -project machine-config-operator
  • /metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl
</details>


Also applies to: 45-47, 64-66, 69-71, 74-76

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 10-10: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/metrics/commands/ai-docs-telemetry.md around lines 10 - 13, Fenced
code blocks in the ai-docs-telemetry markdown are missing language identifiers
and trigger markdownlint MD040; update each triple-backtick block that contains
CLI examples (e.g. blocks containing "/metrics:ai-docs-telemetry -scan [-project
]", "/metrics:ai-docs-telemetry -session <path-to-session.jsonl>",
"/metrics:ai-docs-telemetry -scan", "/metrics:ai-docs-telemetry -scan -project
enhancements", "/metrics:ai-docs-telemetry -scan -project
machine-config-operator", and the session path example like
"~/.claude/projects//.jsonl") to include a language tag
(use bash) immediately after the opening ````` so each block starts with


## Description
The `metrics:ai-docs-telemetry` command analyzes Claude Code session logs to track how agentic documentation (ai-docs) is used during development. It parses session JSONL files to extract Read tool calls to ai-docs files and generates telemetry events.

This helps measure:
- Documentation effectiveness and usage patterns
- Which files are accessed most frequently
- Entry points for documentation discovery (AGENTS.md, direct search, etc.)
- Navigation paths through documentation

All output is JSON to stdout, making it easy to pipe to `jq` for analysis.

## Implementation
```python
${CLAUDE_PLUGIN_ROOT}/scripts/ai_docs_telemetry.py "$@"
```

The script:
- Parses `~/.claude/projects/` JSONL files
- Detects Read tool calls to files matching `ai-docs/`, `AGENTS.md`, or `CLAUDE.md`
- Tracks access sequence and timestamps
- Identifies entry points (AGENTS.md vs direct search)
- Privacy-first: Only file paths tracked, no code/prompts/user data

## Return Value
- **JSON**: Single event or array of events
- **Summary**: Printed to stderr with session counts

## Examples

1. **Scan all recent sessions (last 7 days)**:
```
/metrics:ai-docs-telemetry -scan
```
Output:
```json
[
{
"event_type": "ai_docs_usage",
"session_id": "a0350e3f-1853-4a56-be01-865cd0df1944",
"documentation": {
"entry_point": "AGENTS.md",
"files_accessed": [...],
"total_files": 5
}
}
]
```

2. **Scan only enhancements repository**:
```
/metrics:ai-docs-telemetry -scan -project enhancements
```

3. **Scan only machine-config-operator repository**:
```
/metrics:ai-docs-telemetry -scan -project machine-config-operator
```

4. **Analyze a specific session**:
```
/metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl
```

5. **Pipe to jq for analysis**:
```bash
# Count files by entry point
/metrics:ai-docs-telemetry -scan | jq -r '.[] | "\(.documentation.entry_point): \(.documentation.total_files)"'

# List most accessed files
/metrics:ai-docs-telemetry -scan | jq -r '.[] | .documentation.files_accessed[].path' | sort | uniq -c | sort -rn

# Filter sessions with >5 files accessed
/metrics:ai-docs-telemetry -scan | jq '.[] | select(.documentation.total_files > 5)'
```

## Arguments
- `-scan`: Scan all recent Claude Code sessions (last 7 days)
- `-project <name>`: Filter sessions by project name (e.g., "enhancements", "machine-config-operator")
- `-session <path>`: Analyze a specific session JSONL file

## Related
- Session hooks: `metrics` plugin's `SessionEnd` hook
- General metrics: `send_session_metrics.py`
251 changes: 251 additions & 0 deletions plugins/metrics/scripts/ai_docs_telemetry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
#!/usr/bin/env python3
"""
AI Docs Telemetry Analysis Script

Analyzes Claude Code session logs to track ai-docs usage patterns.
Parses session JSONL files to extract Read tool calls to ai-docs files.

Usage:
ai_docs_telemetry.py -scan [-project <name>]
ai_docs_telemetry.py -session <path-to-session.jsonl>
"""

import sys
import json
import os
import pathlib
import datetime
import argparse
from typing import Optional, List, Dict, Any
from dataclasses import dataclass, asdict


@dataclass
class FileAccess:
"""Represents a single file access in the session."""
path: str
sequence: int
time: str


@dataclass
class PlatformInfo:
"""Platform information."""
name: str = "claude-code"
version: str = "unknown"


@dataclass
class RepositoryInfo:
"""Repository information extracted from session path."""
name: str
path: str


@dataclass
class DocumentationInfo:
"""Documentation usage information."""
entry_point: str
files_accessed: List[Dict[str, Any]]
total_files: int


@dataclass
class TelemetryEvent:
"""Complete telemetry event."""
event_type: str
version: str
timestamp: str
session_id: str
platform: Dict[str, str]
repository: Dict[str, str]
documentation: Dict[str, Any]


def extract_repo_info(session_path: str) -> RepositoryInfo:
"""
Extract repository information from session path.
Path format: ~/.claude/projects/<repo-path-hash>/<session-id>.jsonl
"""
parts = session_path.split("/projects/")
if len(parts) < 2:
return RepositoryInfo(name="unknown", path="unknown")

# Get the project directory name
project_dir = parts[1].split("/")[0]

# Decode project name (simplified - just replace dashes with slashes)
repo_name = project_dir.replace("-", "/")

return RepositoryInfo(name=repo_name, path=project_dir)


def detect_entry_point(files: List[FileAccess]) -> str:
"""Determine how user discovered ai-docs."""
if not files:
return "unknown"

first = files[0].path
if first.endswith("AGENTS.md") or first.endswith("CLAUDE.md"):
return "AGENTS.md"
if first.endswith("README.md"):
return "README.md"

return "direct-search"


def process_session(session_path: str) -> Optional[TelemetryEvent]:
"""
Analyze a Claude Code session log and extract ai-docs usage.
Returns None if no ai-docs usage detected.
"""
try:
with open(session_path, 'r') as f:
content = f.read()
except Exception as e:
print(f"Error reading session: {e}", file=sys.stderr)
return None

lines = content.split('\n')
ai_docs_files: List[FileAccess] = []
session_id = pathlib.Path(session_path).stem

for line in lines:
if not line.strip():
continue

try:
event = json.loads(line)
except json.JSONDecodeError:
continue

# Look for Read tool calls to ai-docs files
if event.get("type") != "assistant":
continue

msg = event.get("message", {})
content_arr = msg.get("content", [])

for item in content_arr:
if not isinstance(item, dict):
continue

if item.get("type") == "tool_use" and item.get("name") == "Read":
input_data = item.get("input", {})
file_path = input_data.get("file_path", "")

# Check if it's an ai-docs file or AGENTS.md
if ("ai-docs/" in file_path or
file_path.endswith("AGENTS.md") or
file_path.endswith("CLAUDE.md")):

timestamp = event.get("timestamp", datetime.datetime.now().isoformat())

ai_docs_files.append(FileAccess(
path=file_path,
sequence=len(ai_docs_files) + 1,
time=timestamp
))
Comment on lines +144 to +148
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raw file_path in telemetry can leak local identifiers.

Line 145 stores the full tool input path. Absolute paths can expose usernames or sensitive local structure, conflicting with anonymous telemetry goals.

🔒 Suggested fix
                     ai_docs_files.append(FileAccess(
-                        path=file_path,
+                        path=redact_documentation_path(file_path),
                         sequence=len(ai_docs_files) + 1,
                         time=timestamp
                     ))

Add a small sanitizer helper (outside this range), for example:

def redact_documentation_path(file_path: str) -> str:
    normalized = file_path.replace("\\", "/")
    if "ai-docs/" in normalized:
        return "ai-docs/" + normalized.split("ai-docs/", 1)[1]
    return pathlib.PurePath(normalized).name  # AGENTS.md / CLAUDE.md fallback
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 144 - 148, The
telemetry currently appends the raw file_path into ai_docs_files (via
FileAccess) which can leak local identifiers; add a sanitizer function (e.g.,
redact_documentation_path) outside this block and call it before creating
FileAccess so that you store a redacted path instead of the raw file_path;
update the code that constructs FileAccess (the ai_docs_files.append(...) call)
to pass redact_documentation_path(file_path) for the path field and keep
sequence and time unchanged to preserve ordering and timestamps.


if not ai_docs_files:
return None

# Extract repository info
repo_info = extract_repo_info(session_path)

# Build telemetry event
event = TelemetryEvent(
event_type="ai_docs_usage",
version="1.0",
timestamp=datetime.datetime.now().isoformat(),
session_id=session_id,
platform=asdict(PlatformInfo()),
repository=asdict(repo_info),
documentation={
"entry_point": detect_entry_point(ai_docs_files),
"files_accessed": [asdict(f) for f in ai_docs_files],
"total_files": len(ai_docs_files)
}
)

return event


def scan_recent_sessions(project_filter: Optional[str] = None) -> List[TelemetryEvent]:
"""
Scan ~/.claude/projects/ for recent sessions with ai-docs usage.
Returns list of telemetry events.
"""
home_dir = pathlib.Path.home()
projects_dir = home_dir / ".claude" / "projects"

if not projects_dir.exists():
print(f"Projects directory not found: {projects_dir}", file=sys.stderr)
return []

events = []
processed_count = 0
seven_days_ago = datetime.datetime.now() - datetime.timedelta(days=7)

# Walk through all project directories
for session_file in projects_dir.glob("**/*.jsonl"):
# Skip files older than 7 days
mtime = datetime.datetime.fromtimestamp(session_file.stat().st_mtime)
if mtime < seven_days_ago:
continue

# Filter by project if specified
if project_filter and project_filter not in str(session_file):
continue

processed_count += 1

# Quick pre-filter: check if file contains ai-docs markers
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content):
continue
except Exception:
continue
Comment on lines +204 to +209
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pre-filter drops valid CLAUDE.md sessions (and silently hides read errors).

Line 206 pre-filtering omits CLAUDE.md, so sessions that only touch that entry point are skipped before parsing. Also, read failures are swallowed, which masks undercounting.

✅ Suggested fix
         try:
             content = session_file.read_text()
-            if not ("ai-docs/" in content or "AGENTS.md" in content):
+            if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content):
                 continue
-        except Exception:
+        except (OSError, UnicodeError) as e:
+            print(f"Skipping unreadable session {session_file}: {e}", file=sys.stderr)
             continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content):
continue
except Exception:
continue
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content):
continue
except (OSError, UnicodeError) as e:
print(f"Skipping unreadable session {session_file}: {e}", file=sys.stderr)
continue
🧰 Tools
🪛 Ruff (0.15.12)

[error] 208-209: try-except-continue detected, consider logging the exception

(S112)


[warning] 208-208: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 204 - 209, The
pre-filter in the loop around session_file.read_text() wrongly only checks for
"ai-docs/" or "AGENTS.md" and thus skips valid sessions that reference
"CLAUDE.md"; also it swallows read exceptions silently. Update the predicate to
include "CLAUDE.md" (e.g., check for "ai-docs/" or "AGENTS.md" or "CLAUDE.md")
so those sessions are not skipped, and change the except Exception block in the
same scope (around session_file.read_text()) to log the exception (using the
existing logger) with context about the file instead of silently continuing so
read errors are visible for telemetry counting. Ensure you modify the checks and
the error handling where session_file.read_text() is called and the surrounding
try/except.


# Process session
event = process_session(str(session_file))
if event:
events.append(event)

print(f"\n📊 Summary: {processed_count} sessions scanned, {len(events)} with ai-docs usage",
file=sys.stderr)

return events


def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description="Analyze Claude Code session logs for ai-docs usage"
)
parser.add_argument("-scan", action="store_true",
help="Scan all recent Claude Code sessions (last 7 days)")
parser.add_argument("-project", type=str,
help="Filter by project name (e.g., 'enhancements', 'machine-config-operator')")
parser.add_argument("-session", type=str,
help="Analyze a specific session JSONL file")

args = parser.parse_args()

if args.scan:
events = scan_recent_sessions(args.project)
if events:
# Output as JSON array
print(json.dumps([asdict(e) for e in events], indent=2))
elif args.session:
Comment on lines +236 to +241
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Always emit JSON for -scan (including empty results).

Current behavior prints nothing when no events are found. That breaks JSON-contract expectations and makes downstream jq pipelines brittle.

🧩 Suggested fix
     if args.scan:
         events = scan_recent_sessions(args.project)
-        if events:
-            # Output as JSON array
-            print(json.dumps([asdict(e) for e in events], indent=2))
+        # Always output JSON array (possibly empty)
+        print(json.dumps([asdict(e) for e in events], indent=2))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if args.scan:
events = scan_recent_sessions(args.project)
if events:
# Output as JSON array
print(json.dumps([asdict(e) for e in events], indent=2))
elif args.session:
if args.scan:
events = scan_recent_sessions(args.project)
# Always output JSON array (possibly empty)
print(json.dumps([asdict(e) for e in events], indent=2))
elif args.session:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 236 - 241, The
-scan branch currently only prints JSON when events is truthy; change it so it
always emits a JSON array (possibly empty) from
scan_recent_sessions(args.project) — call scan_recent_sessions into events and
unconditionally print json.dumps([asdict(e) for e in events], indent=2) even if
events is empty, ensuring downstream jq pipelines always receive valid JSON;
update the block around args.scan, scan_recent_sessions, events and the asdict
conversion accordingly.

event = process_session(args.session)
if event:
print(json.dumps(asdict(event), indent=2))
else:
parser.print_help()
sys.exit(1)


if __name__ == "__main__":
main()
Loading