Skip to content

Add ai-docs telemetry analysis to metrics plugin#450

Open
Prashanth684 wants to merge 1 commit into
openshift-eng:mainfrom
Prashanth684:agentic-docs-metrics
Open

Add ai-docs telemetry analysis to metrics plugin#450
Prashanth684 wants to merge 1 commit into
openshift-eng:mainfrom
Prashanth684:agentic-docs-metrics

Conversation

@Prashanth684
Copy link
Copy Markdown
Contributor

@Prashanth684 Prashanth684 commented May 6, 2026

Follow up for: #437

Add /metrics:ai-docs-telemetry command to analyze Claude Code session logs for agentic documentation usage patterns. Tracks which ai-docs files are accessed, entry points (AGENTS.md vs direct search), and navigation patterns.

Usage:

Scan all recent sessions (last 7 days)

/metrics:ai-docs-telemetry -scan

Scan specific project

/metrics:ai-docs-telemetry -scan -project enhancements

Analyze specific session

/metrics:ai-docs-telemetry -session ~/.claude/projects//.jsonl

Pipe to jq for analysis

/metrics:ai-docs-telemetry -scan | jq -r '.[] | .documentation.entry_point' | sort | uniq -c

Summary by CodeRabbit

  • New Features
    • Added AI documentation usage telemetry command to track when users access AI-related documentation files
    • Enables scanning of recent sessions with optional project filtering
    • Supports analyzing individual session files and exporting results in JSON format for further analysis

Add /metrics:ai-docs-telemetry command to analyze Claude Code session
logs for agentic documentation usage patterns. Tracks which ai-docs
files are accessed, entry points (AGENTS.md vs direct search), and
navigation patterns.

Usage:
  # Scan all recent sessions (last 7 days)
  /metrics:ai-docs-telemetry -scan

  # Scan specific project
  /metrics:ai-docs-telemetry -scan -project enhancements

  # Analyze specific session
  /metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl

  # Pipe to jq for analysis
  /metrics:ai-docs-telemetry -scan | jq -r '.[] | .documentation.entry_point' | sort | uniq -c
@openshift-ci openshift-ci Bot requested review from bryan-cox and enxebre May 6, 2026 23:27
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Prashanth684

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Walkthrough

This pull request adds AI Docs telemetry capabilities to the metrics plugin. A new command scans Claude Code session JSONL logs to detect Read tool calls to ai-docs paths, emitting structured telemetry events as JSON output. Documentation and Python implementation support session scanning with project filtering or single-session analysis.

Changes

AI Docs Telemetry Feature

Layer / File(s) Summary
Data Models & Core Processing
plugins/metrics/scripts/ai_docs_telemetry.py (lines 1–171)
Defines FileAccess, PlatformInfo, RepositoryInfo, DocumentationInfo, and TelemetryEvent dataclasses. Implements extract_repo_info(), detect_entry_point(), and process_session() to parse JSONL logs, filter for Read tool calls to ai-docs paths, and emit structured telemetry.
Session Scanning & CLI
plugins/metrics/scripts/ai_docs_telemetry.py (lines 174–248)
Implements scan_recent_sessions() to traverse recent ~/.claude/projects/**/*.jsonl files (7-day window) with optional project substring filtering. Adds main() entry point supporting -scan, -project, and -session CLI modes. Outputs JSON to stdout and summary to stderr.
Command Documentation
plugins/metrics/commands/ai-docs-telemetry.md
Documents command metadata, synopsis, description, implementation details, return values, and usage examples including scanning recent sessions, filtering by project, analyzing single files, and piping to jq.
Plugin Integration
plugins/metrics/README.md
Updates overview and Commands section to introduce /metrics:ai-docs-telemetry with links to command documentation and quick-start examples. Expands Source Code section to list the new telemetry script and commands directory.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding ai-docs telemetry analysis capability to the metrics plugin, which aligns with all three file changes (README updates, command documentation, and new Python script).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No Real People Names In Style References ✅ Passed No references to real people by name found in plugin commands, documentation, or example prompts. All content is technical and functional, with no style references using real person names.
No Assumed Git Remote Names ✅ Passed No git remote operations or hardcoded remote names found. PR adds ai-docs telemetry tracking, not involving git operations.
Git Push Safety Rules ✅ Passed The PR adds telemetry analysis for ai-docs usage. No git push operations, force pushes, or autonomous push workflows are present in any of the new files.
No Untrusted Mcp Servers ✅ Passed No MCP server installations detected in PR. Adds only documentation and Python script with standard libraries.
Ai-Helpers Overlap Detection ✅ Passed No overlapping functionality detected. PR adds unique ai-docs telemetry command to metrics plugin. No existing commands track documentation usage. Command name is unique.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/metrics/commands/ai-docs-telemetry.md`:
- Around line 10-13: Fenced code blocks in the ai-docs-telemetry markdown are
missing language identifiers and trigger markdownlint MD040; update each
triple-backtick block that contains CLI examples (e.g. blocks containing
"/metrics:ai-docs-telemetry -scan [-project <name>]",
"/metrics:ai-docs-telemetry -session <path-to-session.jsonl>",
"/metrics:ai-docs-telemetry -scan", "/metrics:ai-docs-telemetry -scan -project
enhancements", "/metrics:ai-docs-telemetry -scan -project
machine-config-operator", and the session path example like
"~/.claude/projects/<project>/<session-id>.jsonl") to include a language tag
(use bash) immediately after the opening ````` so each block starts with
```bash.

In `@plugins/metrics/scripts/ai_docs_telemetry.py`:
- Around line 144-148: The telemetry currently appends the raw file_path into
ai_docs_files (via FileAccess) which can leak local identifiers; add a sanitizer
function (e.g., redact_documentation_path) outside this block and call it before
creating FileAccess so that you store a redacted path instead of the raw
file_path; update the code that constructs FileAccess (the
ai_docs_files.append(...) call) to pass redact_documentation_path(file_path) for
the path field and keep sequence and time unchanged to preserve ordering and
timestamps.
- Around line 236-241: The -scan branch currently only prints JSON when events
is truthy; change it so it always emits a JSON array (possibly empty) from
scan_recent_sessions(args.project) — call scan_recent_sessions into events and
unconditionally print json.dumps([asdict(e) for e in events], indent=2) even if
events is empty, ensuring downstream jq pipelines always receive valid JSON;
update the block around args.scan, scan_recent_sessions, events and the asdict
conversion accordingly.
- Around line 204-209: The pre-filter in the loop around
session_file.read_text() wrongly only checks for "ai-docs/" or "AGENTS.md" and
thus skips valid sessions that reference "CLAUDE.md"; also it swallows read
exceptions silently. Update the predicate to include "CLAUDE.md" (e.g., check
for "ai-docs/" or "AGENTS.md" or "CLAUDE.md") so those sessions are not skipped,
and change the except Exception block in the same scope (around
session_file.read_text()) to log the exception (using the existing logger) with
context about the file instead of silently continuing so read errors are visible
for telemetry counting. Ensure you modify the checks and the error handling
where session_file.read_text() is called and the surrounding try/except.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c80a3b1f-782d-4b98-84df-34ba44fb0ab3

📥 Commits

Reviewing files that changed from the base of the PR and between d2de5a1 and db733b1.

📒 Files selected for processing (3)
  • plugins/metrics/README.md
  • plugins/metrics/commands/ai-docs-telemetry.md
  • plugins/metrics/scripts/ai_docs_telemetry.py

Comment on lines +10 to +13
```
/metrics:ai-docs-telemetry -scan [-project <name>]
/metrics:ai-docs-telemetry -session <path-to-session.jsonl>
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

Several fenced blocks are missing a language tag, which triggers markdownlint MD040 and can fail/pollute docs CI.

🛠️ Suggested fix
-```
+```bash
 /metrics:ai-docs-telemetry -scan [-project <name>]
 /metrics:ai-docs-telemetry -session <path-to-session.jsonl>
  • /metrics:ai-docs-telemetry -scan
  • /metrics:ai-docs-telemetry -scan -project enhancements
  • /metrics:ai-docs-telemetry -scan -project machine-config-operator
  • /metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl
</details>


Also applies to: 45-47, 64-66, 69-71, 74-76

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 10-10: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/metrics/commands/ai-docs-telemetry.md around lines 10 - 13, Fenced
code blocks in the ai-docs-telemetry markdown are missing language identifiers
and trigger markdownlint MD040; update each triple-backtick block that contains
CLI examples (e.g. blocks containing "/metrics:ai-docs-telemetry -scan [-project
]", "/metrics:ai-docs-telemetry -session <path-to-session.jsonl>",
"/metrics:ai-docs-telemetry -scan", "/metrics:ai-docs-telemetry -scan -project
enhancements", "/metrics:ai-docs-telemetry -scan -project
machine-config-operator", and the session path example like
"~/.claude/projects//.jsonl") to include a language tag
(use bash) immediately after the opening ````` so each block starts with

Comment on lines +144 to +148
ai_docs_files.append(FileAccess(
path=file_path,
sequence=len(ai_docs_files) + 1,
time=timestamp
))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raw file_path in telemetry can leak local identifiers.

Line 145 stores the full tool input path. Absolute paths can expose usernames or sensitive local structure, conflicting with anonymous telemetry goals.

🔒 Suggested fix
                     ai_docs_files.append(FileAccess(
-                        path=file_path,
+                        path=redact_documentation_path(file_path),
                         sequence=len(ai_docs_files) + 1,
                         time=timestamp
                     ))

Add a small sanitizer helper (outside this range), for example:

def redact_documentation_path(file_path: str) -> str:
    normalized = file_path.replace("\\", "/")
    if "ai-docs/" in normalized:
        return "ai-docs/" + normalized.split("ai-docs/", 1)[1]
    return pathlib.PurePath(normalized).name  # AGENTS.md / CLAUDE.md fallback
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 144 - 148, The
telemetry currently appends the raw file_path into ai_docs_files (via
FileAccess) which can leak local identifiers; add a sanitizer function (e.g.,
redact_documentation_path) outside this block and call it before creating
FileAccess so that you store a redacted path instead of the raw file_path;
update the code that constructs FileAccess (the ai_docs_files.append(...) call)
to pass redact_documentation_path(file_path) for the path field and keep
sequence and time unchanged to preserve ordering and timestamps.

Comment on lines +204 to +209
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content):
continue
except Exception:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pre-filter drops valid CLAUDE.md sessions (and silently hides read errors).

Line 206 pre-filtering omits CLAUDE.md, so sessions that only touch that entry point are skipped before parsing. Also, read failures are swallowed, which masks undercounting.

✅ Suggested fix
         try:
             content = session_file.read_text()
-            if not ("ai-docs/" in content or "AGENTS.md" in content):
+            if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content):
                 continue
-        except Exception:
+        except (OSError, UnicodeError) as e:
+            print(f"Skipping unreadable session {session_file}: {e}", file=sys.stderr)
             continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content):
continue
except Exception:
continue
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content):
continue
except (OSError, UnicodeError) as e:
print(f"Skipping unreadable session {session_file}: {e}", file=sys.stderr)
continue
🧰 Tools
🪛 Ruff (0.15.12)

[error] 208-209: try-except-continue detected, consider logging the exception

(S112)


[warning] 208-208: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 204 - 209, The
pre-filter in the loop around session_file.read_text() wrongly only checks for
"ai-docs/" or "AGENTS.md" and thus skips valid sessions that reference
"CLAUDE.md"; also it swallows read exceptions silently. Update the predicate to
include "CLAUDE.md" (e.g., check for "ai-docs/" or "AGENTS.md" or "CLAUDE.md")
so those sessions are not skipped, and change the except Exception block in the
same scope (around session_file.read_text()) to log the exception (using the
existing logger) with context about the file instead of silently continuing so
read errors are visible for telemetry counting. Ensure you modify the checks and
the error handling where session_file.read_text() is called and the surrounding
try/except.

Comment on lines +236 to +241
if args.scan:
events = scan_recent_sessions(args.project)
if events:
# Output as JSON array
print(json.dumps([asdict(e) for e in events], indent=2))
elif args.session:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Always emit JSON for -scan (including empty results).

Current behavior prints nothing when no events are found. That breaks JSON-contract expectations and makes downstream jq pipelines brittle.

🧩 Suggested fix
     if args.scan:
         events = scan_recent_sessions(args.project)
-        if events:
-            # Output as JSON array
-            print(json.dumps([asdict(e) for e in events], indent=2))
+        # Always output JSON array (possibly empty)
+        print(json.dumps([asdict(e) for e in events], indent=2))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if args.scan:
events = scan_recent_sessions(args.project)
if events:
# Output as JSON array
print(json.dumps([asdict(e) for e in events], indent=2))
elif args.session:
if args.scan:
events = scan_recent_sessions(args.project)
# Always output JSON array (possibly empty)
print(json.dumps([asdict(e) for e in events], indent=2))
elif args.session:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 236 - 241, The
-scan branch currently only prints JSON when events is truthy; change it so it
always emits a JSON array (possibly empty) from
scan_recent_sessions(args.project) — call scan_recent_sessions into events and
unconditionally print json.dumps([asdict(e) for e in events], indent=2) even if
events is empty, ensuring downstream jq pipelines always receive valid JSON;
update the block around args.scan, scan_recent_sessions, events and the asdict
conversion accordingly.

kenjpais added a commit to kenjpais/ai-helpers that referenced this pull request May 8, 2026
Added session_scraper.py following PR openshift-eng#450 pattern to extract file
access patterns from Claude Code JSONL session logs.

Features:
- Scrapes ~/.claude/projects/**/*.jsonl files
- Extracts file access patterns, navigation sequences, timing data
- Identifies entry points (AGENTS.md vs direct search)
- Aggregates metrics across multiple sessions
- Exports structured JSON for analysis

Implementation:
- lib/metrics/session_scraper.py (417 lines)
  - SessionScraper class with session file parsing
  - FileAccess, NavigationSequence, SessionTelemetry dataclasses
  - Aggregate metrics calculation
  - JSON export functionality

Testing:
- tests/test_session_scraper.py (6 tests, all passing)
  - test_is_agentic_doc_path
  - test_extract_file_access
  - test_scrape_session_file
  - test_navigation_sequences
  - test_aggregate_metrics
  - test_export_to_json

Documentation:
- Updated README.md with session scraping usage examples
- Updated TEST_REPORT.md to mark enhancement as complete

This completes the optional enhancement from REFACTOR_MAY_8.md Task 5.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
kenjpais added a commit to kenjpais/ai-helpers that referenced this pull request May 13, 2026
Added session_scraper.py following PR openshift-eng#450 pattern to extract file
access patterns from Claude Code JSONL session logs.

Features:
- Scrapes ~/.claude/projects/**/*.jsonl files
- Extracts file access patterns, navigation sequences, timing data
- Identifies entry points (AGENTS.md vs direct search)
- Aggregates metrics across multiple sessions
- Exports structured JSON for analysis

Implementation:
- lib/metrics/session_scraper.py (417 lines)
  - SessionScraper class with session file parsing
  - FileAccess, NavigationSequence, SessionTelemetry dataclasses
  - Aggregate metrics calculation
  - JSON export functionality

Testing:
- tests/test_session_scraper.py (6 tests, all passing)
  - test_is_agentic_doc_path
  - test_extract_file_access
  - test_scrape_session_file
  - test_navigation_sequences
  - test_aggregate_metrics
  - test_export_to_json

Documentation:
- Updated README.md with session scraping usage examples
- Updated TEST_REPORT.md to mark enhancement as complete

This completes the optional enhancement from REFACTOR_MAY_8.md Task 5.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
kenjpais added a commit to kenjpais/ai-helpers that referenced this pull request May 14, 2026
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant