Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Run both commands sequentially in the **same foreground terminal**. Use a **120-
### Step 1a: Pull feedback

```powershell
New-Item -ItemType Directory -Path output -Force | Out-Null; if (Test-Path output/feedback_output.json) { Remove-Item output/feedback_output.json }; python cli.py report feedback -s <start_date> -e <end_date> -l <language> --exclude good | Out-File -Encoding UTF8 output/feedback_output.json
New-Item -ItemType Directory -Path output -Force | Out-Null; if (Test-Path output/feedback_output.json) { Remove-Item output/feedback_output.json }; python cli.py report feedback -s <start_date> -e <end_date> -l <language> --exclude good --include-implicit | Out-File -Encoding UTF8 output/feedback_output.json
```

### Step 1b: Pull memories
Expand Down Expand Up @@ -106,8 +106,16 @@ Based on the analysis, propose specific new lines to add to `metadata/{lang}/fil
1. Follow the existing format: ` N. DO NOT <description>`
2. Be numbered sequentially after the last existing rule
3. **Not duplicate an existing rule** — Before proposing a rule, compare it against every existing rule in the current `filter.yaml`. If an existing rule already covers the same behavior (even with different wording), do NOT propose it again. Explain in the analysis that the theme was already covered and cite the existing rule number.
4. Be supported by at least 2 feedback items or 1 memory with `is_exception: true`
5. Be phrased as a clear, actionable instruction the LLM can follow
4. Be phrased as a clear, actionable instruction the LLM can follow

### Signal Strength

When presenting recommendations, clearly label each with its signal strength:

- **Strong signal**: 2+ explicit feedback items (downvotes with reasons) or 1 memory with `is_exception: true`
- **Low signal**: Only 1 explicit feedback item, or only implicit bad comments (no explicit downvote/reason)

Do NOT automatically exclude low-signal items. Present ALL actionable patterns to the user with their signal strength clearly marked, and let the user (or reviewer) decide whether to include them in the PR.
Comment thread
tjprescott marked this conversation as resolved.

Present the recommendations in a numbered list, each with:
- The proposed rule text
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,28 @@ Unless the user says otherwise, always apply these defaults:
- **Environment**: `production`
- **Language**: All languages (do not pass `--language` unless user specifies one)
- **Exclude**: Do not pass `--exclude` unless user asks to filter out certain feedback types
- **Include implicit**: Always pass `--include-implicit` by default. Only omit it if the user explicitly asks to exclude implicit bad comments.
- **Format**: JSON (do not pass `--format`)

## Implicit Bad Comments

The `--include-implicit` flag also returns **implicit bad** comments: AI comments on approved revisions that were never upvoted, downvoted, resolved, and have no Feedback entries. The inference is that the reviewer ignored them and approved anyway, suggesting they were unhelpful.

> **Date semantics differ**: Explicit feedback is filtered by feedback submission time (`Feedback[].SubmittedOn` / `ChangeHistory[].ChangedOn`), but implicit bad is filtered by comment creation time (`CreatedOn`). A comment created in January with no interaction will appear in January's implicit bad results, not March's.

This skill always passes `--include-implicit` (the CLI flag defaults to off, but the skill includes it for completeness). It has a weaker signal than explicit feedback because there is no reason or confirmation — just silence. Only omit `--include-implicit` if the user explicitly asks to exclude them (e.g., "only explicit feedback", "exclude implicit bad").

The output will contain items with `"FeedbackTypes": ["implicit_bad"]`.

### Summarizing Implicit Bad

When presenting results, **break out implicit bad themes separately** from explicit feedback:
Comment thread
tjprescott marked this conversation as resolved.

1. **Explicit feedback** — Summarize count, breakdown by reason, and themes for items that have explicit feedback types (e.g., `bad`, `delete`).
2. **Implicit bad** — Summarize separately: count, common comment topics/patterns, and any notable themes. Note that these lack a reason — group them by the comment content or guideline referenced instead.

This separation helps the user understand the strength of signal behind each theme.

## Date Resolution

The user will typically specify a calendar month by name (e.g. "March", "January 2025"). Resolve to the full month date range:
Expand All @@ -45,7 +65,7 @@ Show the resolved command and run it immediately in a **foreground terminal** wi

**Full terminal command** (cleanup + run):
```powershell
New-Item -ItemType Directory -Path output -Force | Out-Null; if (Test-Path output/feedback_output.json) { Remove-Item output/feedback_output.json }; python cli.py report feedback -s <start_date> -e <end_date> | Out-File -Encoding UTF8 output/feedback_output.json
New-Item -ItemType Directory -Path output -Force | Out-Null; if (Test-Path output/feedback_output.json) { Remove-Item output/feedback_output.json }; python cli.py report feedback -s <start_date> -e <end_date> --include-implicit | Out-File -Encoding UTF8 output/feedback_output.json
```

After the command completes, **read the output file** with `read_file` to get the JSON results. Summarize the findings for the user (total count, breakdown by feedback reason, common themes, etc.).
Expand All @@ -57,20 +77,23 @@ For follow-up questions about the same data (filtering, counting, searching), **
### Examples

```powershell
# All feedback for March 2025
python cli.py report feedback -s 2025-03-01 -e 2025-03-31
# All feedback for March 2025 (implicit bad included by default)
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --include-implicit

# Python feedback only
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 -l python
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 -l python --include-implicit

# Exclude implicit bad (only explicit feedback)
python cli.py report feedback -s 2025-03-01 -e 2025-03-31

# Exclude good feedback (show only bad and deleted)
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --exclude good
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --include-implicit --exclude good

# YAML output
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --format yaml
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --include-implicit --format yaml

# Staging environment
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --environment staging
python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --include-implicit --environment staging
```

## Available Flags
Expand All @@ -81,13 +104,14 @@ python cli.py report feedback -s 2025-03-01 -e 2025-03-31 --environment staging
| `--end-date` / `-e` | string | required | End date (`YYYY-MM-DD`) |
| `--language` / `-l` | string | all | Language to filter by (e.g., `python`, `Go`, `C#`) |
| `--environment` | string | `production` | `production` or `staging` |
| `--exclude` | list | none | Feedback types to exclude: `good`, `bad`, `delete` |
| `--exclude` | list | none | Feedback types to exclude: `good`, `bad`, `delete`, `implicit_bad` |
| `--include-implicit` | flag | off | Include implicit bad comments (unresolved, unvoted on approved revisions) |
Comment thread
tjprescott marked this conversation as resolved.
| `--format` / `-f` | string | `json` | Output format: `json` or `yaml` |

## Gotchas

- **Output can be large**: Redirect to file and use `read_file` rather than relying on terminal output.
- **Date range filters by feedback submission time**: Not by when the comment was created. A comment created in January but downvoted in March will appear in March's feedback report.
- **Date range semantics are mixed**: Explicit feedback filters by feedback submission time (a comment created in January but downvoted in March appears in March). Implicit bad filters by comment creation time (a comment created in January with no interaction appears in January).
- **Use `python cli.py` not `.\avc`**: The `avc.bat` script may resolve to system Python.
- **Do NOT use `2>&1`**: Merges stderr into stdout, corrupting JSON. Only redirect stdout.
- **Do NOT use `>`**: Produces UTF-16 in PowerShell 5.1. Use `| Out-File -Encoding UTF8`.
Expand Down
17 changes: 15 additions & 2 deletions packages/python-packages/apiview-copilot/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2543,17 +2543,23 @@ def get_feedback(
exclude: Optional[list[str]] = None,
environment: str = "production",
output_format: str = "json",
include_implicit: bool = False,
):
"""
Retrieve AI comment feedback from APIView between start_date and end_date.
If --language is omitted, returns feedback for all languages.
Use --include-implicit to also return implicit bad comments: AI comments created
in the date range that are on approved revisions with no votes, no Feedback entries,
no resolution, and not deleted. Note that the date range filters by comment creation
time for implicit bad (vs. feedback submission time for explicit feedback).
"""
results = _get_ai_comment_feedback(
language=language,
start_date=start_date,
end_date=end_date,
exclude=exclude,
environment=environment,
include_implicit=include_implicit,
)
if output_format == "yaml":
print(yaml.dump(results, default_flow_style=False, allow_unicode=True, sort_keys=False))
Expand Down Expand Up @@ -3248,9 +3254,9 @@ def load_arguments(self, command):
"exclude",
type=str,
nargs="*",
help="Feedback types to exclude. Can be 'good', 'bad', or 'delete'.",
help="Feedback types to exclude. Can be 'good', 'bad', 'delete', or 'implicit_bad'.",
options_list=["--exclude"],
choices=["good", "bad", "delete"],
choices=["good", "bad", "delete", "implicit_bad"],
)
ac.argument(
"output_format",
Expand All @@ -3260,6 +3266,13 @@ def load_arguments(self, command):
default="json",
choices=["json", "yaml"],
)
ac.argument(
"include_implicit",
action="store_true",
help="Include implicit bad comments (AI comments created in date range on approved revisions with no votes, no feedback, and no resolution).",
options_list=["--include-implicit"],
default=False,
)
with ArgumentsContext(self, "report memory") as ac:
ac.argument(
"language",
Expand Down
2 changes: 1 addition & 1 deletion packages/python-packages/apiview-copilot/docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Every AI-generated comment is assigned to exactly one mutually exclusive quality
| `downvoted` (`bad`) | Has ≥1 downvote (trumps upvotes) | Reviewer explicitly disagreed |
| `upvoted` (`good`) | Has ≥1 upvote and no downvotes | Reviewer explicitly agreed |
| `implicit_good` | `IsResolved = true`, no votes | Comment resolved without explicit feedback — likely acted on |
| `implicit_bad` | In an **approved** revision, not resolved, no votes | Comment was ignored after approval — likely not useful |
| `implicit_bad` | In an **approved** revision, not resolved, no votes, no feedback | Comment was ignored after approval — likely not useful |
| `neutral` | In an **unapproved** revision, not resolved, no votes | No signal yet (review still in progress) |

The sum of all six buckets equals `total_ai_comment_count`.
Expand Down
145 changes: 144 additions & 1 deletion packages/python-packages/apiview-copilot/src/_apiview.py
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,7 @@ def get_ai_comment_feedback(
end_date: str,
exclude: Optional[list[str]] = None,
environment: str = "production",
include_implicit: bool = False,
) -> list[dict]:
"""
Retrieves AI-generated comments that received feedback within the specified date range.
Expand All @@ -617,6 +618,11 @@ def get_ai_comment_feedback(
- For detailed feedback: checks Feedback[].SubmittedOn
- For deletions: checks ChangeHistory[].ChangedOn where ChangeAction='Deleted'

When include_implicit is True, also returns "implicit bad" comments: AI comments created
in the date range that are on approved revisions but have no votes, no resolution, no
Feedback entries, and aren't deleted. These are inferred as unhelpful because the reviewer
approved without interacting with the comment.

Note: Upvotes/Downvotes lists don't have timestamps, so comments with only
upvotes/downvotes (and no Feedback entries or deletion events in the date range)
will not be returned.
Expand All @@ -625,8 +631,10 @@ def get_ai_comment_feedback(
language: Language to filter by (e.g., 'python', 'java'). If None, returns all languages.
start_date: Start date in YYYY-MM-DD format (filters by feedback submission time)
end_date: End date in YYYY-MM-DD format (filters by feedback submission time)
exclude: List of feedback types to exclude. Can include 'good', 'bad', 'delete'.
exclude: List of feedback types to exclude. Can include 'good', 'bad', 'delete', 'implicit_bad'.
environment: The APIView environment ('production' or 'staging')
include_implicit: If True, also include implicit bad comments (unresolved, unvoted AI comments
on approved revisions created in the date range).

Returns:
List of dicts containing comment info and feedback, preserving database field names
Expand Down Expand Up @@ -735,6 +743,141 @@ def get_ai_comment_feedback(
comment["FeedbackTypes"] = feedback_types
result.append(comment)

# If include_implicit is requested, also fetch implicit bad comments
if include_implicit and "implicit_bad" not in exclude:
implicit_comments = _get_implicit_bad_comments(
start_date=start_date,
end_date=end_date,
language=language,
environment=environment,
review_lang_map=review_lang_map,
)
result.extend(implicit_comments)
Comment thread
tjprescott marked this conversation as resolved.

return result


def _get_implicit_bad_comments(
start_date: str,
end_date: str,
language: Optional[str],
environment: str,
review_lang_map: dict,
) -> list[dict]:
"""
Retrieves implicit bad comments: AI-generated comments created in the date range
that are on approved revisions but have no votes, no resolution, no Feedback entries,
and aren't deleted.

These are inferred as unhelpful because the reviewer approved the revision without
interacting with the comment.
"""
start_iso = to_iso8601(start_date)
end_iso = to_iso8601(end_date, end_of_day=True)

# Query for AI comments created in the date range that have no interaction
comments_client = get_apiview_cosmos_client(container_name="Comments", environment=environment)
query = """
SELECT c.id, c.ReviewId, c.APIRevisionId, c.ElementId, c.ThreadId,
c.CommentText, c.CorrelationId, c.ChangeHistory, c.IsResolved,
c.Upvotes, c.Downvotes, c.TaggedUsers, c.CommentType, c.Severity,
c.CommentSource, c.ResolutionLocked, c.CreatedBy, c.CreatedOn,
c.IsDeleted, c.IsGeneric, c.GuidelineIds, c.MemoryIds,
c.ConfidenceScore, c.Feedback
FROM c
WHERE c.CommentSource = 'AIGenerated'
AND c.CreatedOn >= @start_date AND c.CreatedOn <= @end_date
AND (NOT IS_DEFINED(c.IsDeleted) OR c.IsDeleted = false)
AND (NOT IS_DEFINED(c.IsResolved) OR c.IsResolved = false)
AND (NOT IS_DEFINED(c.Upvotes) OR ARRAY_LENGTH(c.Upvotes) = 0)
AND (NOT IS_DEFINED(c.Downvotes) OR ARRAY_LENGTH(c.Downvotes) = 0)
Comment thread
tjprescott marked this conversation as resolved.
AND (NOT IS_DEFINED(c.Feedback) OR ARRAY_LENGTH(c.Feedback) = 0)
"""

comments = list(
comments_client.query_items(
query=query,
parameters=[
{"name": "@start_date", "value": start_iso},
{"name": "@end_date", "value": end_iso},
],
enable_cross_partition_query=True,
)
)

if not comments:
return []

# Collect revision IDs to check approval status
revision_ids = set(c.get("APIRevisionId") for c in comments if c.get("APIRevisionId"))
review_ids = set(c.get("ReviewId") for c in comments if c.get("ReviewId"))

# Fetch revision metadata to determine approval status
revisions_container = get_apiview_cosmos_client(container_name="APIRevisions", environment=environment)
approved_revision_ids = set()

if revision_ids:
rev_params = []
rev_clauses = []
for i, rev_id in enumerate(revision_ids):
param_name = f"@rev_{i}"
rev_clauses.append(f"c.id = {param_name}")
rev_params.append({"name": param_name, "value": rev_id})

rev_query = f"SELECT c.id, c.ChangeHistory FROM c WHERE ({' OR '.join(rev_clauses)})"
rev_results = list(
revisions_container.query_items(
query=rev_query, parameters=rev_params, enable_cross_partition_query=True
)
)

for rev in rev_results:
change_history = rev.get("ChangeHistory", [])
if change_history and isinstance(change_history, list):
for change in change_history:
if change.get("ChangeAction") == "Approved":
approved_revision_ids.add(rev["id"])
break

if not approved_revision_ids:
return []

# Fetch language info for any new review IDs not already in review_lang_map
new_review_ids = review_ids - set(review_lang_map.keys())
if new_review_ids:
reviews_container = get_apiview_cosmos_client(container_name="Reviews", environment=environment)
params = []
clauses = []
for i, rid in enumerate(new_review_ids):
param_name = f"@id_{i}"
clauses.append(f"c.id = {param_name}")
params.append({"name": param_name, "value": rid})

review_query = f"SELECT c.id, c.Language FROM c WHERE ({' OR '.join(clauses)})"
review_results = list(
reviews_container.query_items(query=review_query, parameters=params, enable_cross_partition_query=True)
)
for r in review_results:
review_lang_map[r["id"]] = get_language_pretty_name(r.get("Language", ""))

target_language = get_language_pretty_name(language).lower() if language else None

# Filter to comments on approved revisions, matching language
result = []
for comment in comments:
rev_id = comment.get("APIRevisionId")
if rev_id not in approved_revision_ids:
continue

review_id = comment.get("ReviewId", "")
comment_language = review_lang_map.get(review_id, "").lower()
if target_language and comment_language != target_language:
continue

comment["Language"] = review_lang_map.get(review_id, "")
comment["FeedbackTypes"] = ["implicit_bad"]
result.append(comment)

return result


Expand Down
Loading