Add ai-docs telemetry analysis to metrics plugin#450
Conversation
Add /metrics:ai-docs-telemetry command to analyze Claude Code session logs for agentic documentation usage patterns. Tracks which ai-docs files are accessed, entry points (AGENTS.md vs direct search), and navigation patterns. Usage: # Scan all recent sessions (last 7 days) /metrics:ai-docs-telemetry -scan # Scan specific project /metrics:ai-docs-telemetry -scan -project enhancements # Analyze specific session /metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl # Pipe to jq for analysis /metrics:ai-docs-telemetry -scan | jq -r '.[] | .documentation.entry_point' | sort | uniq -c
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Prashanth684 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThis pull request adds AI Docs telemetry capabilities to the metrics plugin. A new command scans Claude Code session JSONL logs to detect Read tool calls to ai-docs paths, emitting structured telemetry events as JSON output. Documentation and Python implementation support session scanning with project filtering or single-session analysis. ChangesAI Docs Telemetry Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/metrics/commands/ai-docs-telemetry.md`:
- Around line 10-13: Fenced code blocks in the ai-docs-telemetry markdown are
missing language identifiers and trigger markdownlint MD040; update each
triple-backtick block that contains CLI examples (e.g. blocks containing
"/metrics:ai-docs-telemetry -scan [-project <name>]",
"/metrics:ai-docs-telemetry -session <path-to-session.jsonl>",
"/metrics:ai-docs-telemetry -scan", "/metrics:ai-docs-telemetry -scan -project
enhancements", "/metrics:ai-docs-telemetry -scan -project
machine-config-operator", and the session path example like
"~/.claude/projects/<project>/<session-id>.jsonl") to include a language tag
(use bash) immediately after the opening ````` so each block starts with
```bash.
In `@plugins/metrics/scripts/ai_docs_telemetry.py`:
- Around line 144-148: The telemetry currently appends the raw file_path into
ai_docs_files (via FileAccess) which can leak local identifiers; add a sanitizer
function (e.g., redact_documentation_path) outside this block and call it before
creating FileAccess so that you store a redacted path instead of the raw
file_path; update the code that constructs FileAccess (the
ai_docs_files.append(...) call) to pass redact_documentation_path(file_path) for
the path field and keep sequence and time unchanged to preserve ordering and
timestamps.
- Around line 236-241: The -scan branch currently only prints JSON when events
is truthy; change it so it always emits a JSON array (possibly empty) from
scan_recent_sessions(args.project) — call scan_recent_sessions into events and
unconditionally print json.dumps([asdict(e) for e in events], indent=2) even if
events is empty, ensuring downstream jq pipelines always receive valid JSON;
update the block around args.scan, scan_recent_sessions, events and the asdict
conversion accordingly.
- Around line 204-209: The pre-filter in the loop around
session_file.read_text() wrongly only checks for "ai-docs/" or "AGENTS.md" and
thus skips valid sessions that reference "CLAUDE.md"; also it swallows read
exceptions silently. Update the predicate to include "CLAUDE.md" (e.g., check
for "ai-docs/" or "AGENTS.md" or "CLAUDE.md") so those sessions are not skipped,
and change the except Exception block in the same scope (around
session_file.read_text()) to log the exception (using the existing logger) with
context about the file instead of silently continuing so read errors are visible
for telemetry counting. Ensure you modify the checks and the error handling
where session_file.read_text() is called and the surrounding try/except.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c80a3b1f-782d-4b98-84df-34ba44fb0ab3
📒 Files selected for processing (3)
plugins/metrics/README.mdplugins/metrics/commands/ai-docs-telemetry.mdplugins/metrics/scripts/ai_docs_telemetry.py
| ``` | ||
| /metrics:ai-docs-telemetry -scan [-project <name>] | ||
| /metrics:ai-docs-telemetry -session <path-to-session.jsonl> | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks.
Several fenced blocks are missing a language tag, which triggers markdownlint MD040 and can fail/pollute docs CI.
🛠️ Suggested fix
-```
+```bash
/metrics:ai-docs-telemetry -scan [-project <name>]
/metrics:ai-docs-telemetry -session <path-to-session.jsonl>-
/metrics:ai-docs-telemetry -scan
-
/metrics:ai-docs-telemetry -scan -project enhancements
-
/metrics:ai-docs-telemetry -scan -project machine-config-operator
-
/metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl
</details>
Also applies to: 45-47, 64-66, 69-71, 74-76
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 10-10: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @plugins/metrics/commands/ai-docs-telemetry.md around lines 10 - 13, Fenced
code blocks in the ai-docs-telemetry markdown are missing language identifiers
and trigger markdownlint MD040; update each triple-backtick block that contains
CLI examples (e.g. blocks containing "/metrics:ai-docs-telemetry -scan [-project
]", "/metrics:ai-docs-telemetry -session <path-to-session.jsonl>",
"/metrics:ai-docs-telemetry -scan", "/metrics:ai-docs-telemetry -scan -project
enhancements", "/metrics:ai-docs-telemetry -scan -project
machine-config-operator", and the session path example like
"~/.claude/projects//.jsonl") to include a language tag
(use bash) immediately after the opening ````` so each block starts with
| ai_docs_files.append(FileAccess( | ||
| path=file_path, | ||
| sequence=len(ai_docs_files) + 1, | ||
| time=timestamp | ||
| )) |
There was a problem hiding this comment.
Raw file_path in telemetry can leak local identifiers.
Line 145 stores the full tool input path. Absolute paths can expose usernames or sensitive local structure, conflicting with anonymous telemetry goals.
🔒 Suggested fix
ai_docs_files.append(FileAccess(
- path=file_path,
+ path=redact_documentation_path(file_path),
sequence=len(ai_docs_files) + 1,
time=timestamp
))Add a small sanitizer helper (outside this range), for example:
def redact_documentation_path(file_path: str) -> str:
normalized = file_path.replace("\\", "/")
if "ai-docs/" in normalized:
return "ai-docs/" + normalized.split("ai-docs/", 1)[1]
return pathlib.PurePath(normalized).name # AGENTS.md / CLAUDE.md fallback🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 144 - 148, The
telemetry currently appends the raw file_path into ai_docs_files (via
FileAccess) which can leak local identifiers; add a sanitizer function (e.g.,
redact_documentation_path) outside this block and call it before creating
FileAccess so that you store a redacted path instead of the raw file_path;
update the code that constructs FileAccess (the ai_docs_files.append(...) call)
to pass redact_documentation_path(file_path) for the path field and keep
sequence and time unchanged to preserve ordering and timestamps.
| try: | ||
| content = session_file.read_text() | ||
| if not ("ai-docs/" in content or "AGENTS.md" in content): | ||
| continue | ||
| except Exception: | ||
| continue |
There was a problem hiding this comment.
Pre-filter drops valid CLAUDE.md sessions (and silently hides read errors).
Line 206 pre-filtering omits CLAUDE.md, so sessions that only touch that entry point are skipped before parsing. Also, read failures are swallowed, which masks undercounting.
✅ Suggested fix
try:
content = session_file.read_text()
- if not ("ai-docs/" in content or "AGENTS.md" in content):
+ if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content):
continue
- except Exception:
+ except (OSError, UnicodeError) as e:
+ print(f"Skipping unreadable session {session_file}: {e}", file=sys.stderr)
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| content = session_file.read_text() | |
| if not ("ai-docs/" in content or "AGENTS.md" in content): | |
| continue | |
| except Exception: | |
| continue | |
| try: | |
| content = session_file.read_text() | |
| if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content): | |
| continue | |
| except (OSError, UnicodeError) as e: | |
| print(f"Skipping unreadable session {session_file}: {e}", file=sys.stderr) | |
| continue |
🧰 Tools
🪛 Ruff (0.15.12)
[error] 208-209: try-except-continue detected, consider logging the exception
(S112)
[warning] 208-208: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 204 - 209, The
pre-filter in the loop around session_file.read_text() wrongly only checks for
"ai-docs/" or "AGENTS.md" and thus skips valid sessions that reference
"CLAUDE.md"; also it swallows read exceptions silently. Update the predicate to
include "CLAUDE.md" (e.g., check for "ai-docs/" or "AGENTS.md" or "CLAUDE.md")
so those sessions are not skipped, and change the except Exception block in the
same scope (around session_file.read_text()) to log the exception (using the
existing logger) with context about the file instead of silently continuing so
read errors are visible for telemetry counting. Ensure you modify the checks and
the error handling where session_file.read_text() is called and the surrounding
try/except.
| if args.scan: | ||
| events = scan_recent_sessions(args.project) | ||
| if events: | ||
| # Output as JSON array | ||
| print(json.dumps([asdict(e) for e in events], indent=2)) | ||
| elif args.session: |
There was a problem hiding this comment.
Always emit JSON for -scan (including empty results).
Current behavior prints nothing when no events are found. That breaks JSON-contract expectations and makes downstream jq pipelines brittle.
🧩 Suggested fix
if args.scan:
events = scan_recent_sessions(args.project)
- if events:
- # Output as JSON array
- print(json.dumps([asdict(e) for e in events], indent=2))
+ # Always output JSON array (possibly empty)
+ print(json.dumps([asdict(e) for e in events], indent=2))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if args.scan: | |
| events = scan_recent_sessions(args.project) | |
| if events: | |
| # Output as JSON array | |
| print(json.dumps([asdict(e) for e in events], indent=2)) | |
| elif args.session: | |
| if args.scan: | |
| events = scan_recent_sessions(args.project) | |
| # Always output JSON array (possibly empty) | |
| print(json.dumps([asdict(e) for e in events], indent=2)) | |
| elif args.session: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 236 - 241, The
-scan branch currently only prints JSON when events is truthy; change it so it
always emits a JSON array (possibly empty) from
scan_recent_sessions(args.project) — call scan_recent_sessions into events and
unconditionally print json.dumps([asdict(e) for e in events], indent=2) even if
events is empty, ensuring downstream jq pipelines always receive valid JSON;
update the block around args.scan, scan_recent_sessions, events and the asdict
conversion accordingly.
Added session_scraper.py following PR openshift-eng#450 pattern to extract file access patterns from Claude Code JSONL session logs. Features: - Scrapes ~/.claude/projects/**/*.jsonl files - Extracts file access patterns, navigation sequences, timing data - Identifies entry points (AGENTS.md vs direct search) - Aggregates metrics across multiple sessions - Exports structured JSON for analysis Implementation: - lib/metrics/session_scraper.py (417 lines) - SessionScraper class with session file parsing - FileAccess, NavigationSequence, SessionTelemetry dataclasses - Aggregate metrics calculation - JSON export functionality Testing: - tests/test_session_scraper.py (6 tests, all passing) - test_is_agentic_doc_path - test_extract_file_access - test_scrape_session_file - test_navigation_sequences - test_aggregate_metrics - test_export_to_json Documentation: - Updated README.md with session scraping usage examples - Updated TEST_REPORT.md to mark enhancement as complete This completes the optional enhancement from REFACTOR_MAY_8.md Task 5. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added session_scraper.py following PR openshift-eng#450 pattern to extract file access patterns from Claude Code JSONL session logs. Features: - Scrapes ~/.claude/projects/**/*.jsonl files - Extracts file access patterns, navigation sequences, timing data - Identifies entry points (AGENTS.md vs direct search) - Aggregates metrics across multiple sessions - Exports structured JSON for analysis Implementation: - lib/metrics/session_scraper.py (417 lines) - SessionScraper class with session file parsing - FileAccess, NavigationSequence, SessionTelemetry dataclasses - Aggregate metrics calculation - JSON export functionality Testing: - tests/test_session_scraper.py (6 tests, all passing) - test_is_agentic_doc_path - test_extract_file_access - test_scrape_session_file - test_navigation_sequences - test_aggregate_metrics - test_export_to_json Documentation: - Updated README.md with session scraping usage examples - Updated TEST_REPORT.md to mark enhancement as complete This completes the optional enhancement from REFACTOR_MAY_8.md Task 5. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Follow up for: #437
Add /metrics:ai-docs-telemetry command to analyze Claude Code session logs for agentic documentation usage patterns. Tracks which ai-docs files are accessed, entry points (AGENTS.md vs direct search), and navigation patterns.
Usage:
Scan all recent sessions (last 7 days)
/metrics:ai-docs-telemetry -scan
Scan specific project
/metrics:ai-docs-telemetry -scan -project enhancements
Analyze specific session
/metrics:ai-docs-telemetry -session ~/.claude/projects//.jsonl
Pipe to jq for analysis
/metrics:ai-docs-telemetry -scan | jq -r '.[] | .documentation.entry_point' | sort | uniq -c
Summary by CodeRabbit