-
Notifications
You must be signed in to change notification settings - Fork 26
feat: add agentic guidelines translation #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dkargatzis
merged 13 commits into
warestack:main
from
harris-ranque:feature/agentic-guidelines
Mar 10, 2026
Merged
Changes from 5 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
255d258
feature: added repo scanning logic
harris-ranque f790c4e
feature: added Agentic Parsing and Translation
harris-ranque ca0504d
fix: updated code following feedbacks from coderrabbit
harris-ranque 390467d
done: AI Extractor Agent
harris-ranque e562e70
Merge branch 'main' into feature/agentic-guidelines
harris-ranque 6331be1
fix: followed CoderRabbits feedback
harris-ranque 67bbbce
Merge branch 'feature/agentic-guidelines' of github.com:savagame/watc…
harris-ranque 10a2080
fix: fixed some exceptions
harris-ranque 3ee6e4d
fix: re-run pre-commit
harris-ranque 6b2eda6
fix: added more information for PR and removed duplication for PR cre…
harris-ranque 843d556
fix: added ambigous rule count on PR comment
harris-ranque bd06137
fix: reverted the allow_anonymouse changes
harris-ranque df9e64d
fix: pre-commit issues
harris-ranque File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| """ | ||
| Rule Extractor Agent: LLM-powered extraction of rule-like statements from markdown. | ||
| """ | ||
|
|
||
| from src.agents.extractor_agent.agent import RuleExtractorAgent | ||
|
|
||
| __all__ = ["RuleExtractorAgent"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| """ | ||
| Rule Extractor Agent: LLM-powered extraction of rule-like statements from markdown. | ||
| """ | ||
|
|
||
| import logging | ||
| import time | ||
| from typing import Any | ||
|
|
||
| from langgraph.graph import END, START, StateGraph | ||
| from pydantic import BaseModel, Field | ||
|
|
||
| from src.agents.base import AgentResult, BaseAgent | ||
| from src.agents.extractor_agent.models import ExtractorOutput | ||
| from src.agents.extractor_agent.prompts import EXTRACTOR_PROMPT | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class ExtractorState(BaseModel): | ||
| """State for the extractor (single-node) graph.""" | ||
|
|
||
| markdown_content: str = "" | ||
| statements: list[str] = Field(default_factory=list) | ||
|
|
||
|
|
||
| class RuleExtractorAgent(BaseAgent): | ||
| """ | ||
| Extractor Agent: reads raw markdown and returns a structured list of rule-like statements. | ||
| Single-node LangGraph: extract -> END. Uses LLM with structured output. | ||
| """ | ||
|
|
||
| def __init__(self, max_retries: int = 3, timeout: float = 30.0): | ||
| super().__init__(max_retries=max_retries, agent_name="extractor_agent") | ||
| self.timeout = timeout | ||
| logger.info("🔧 RuleExtractorAgent initialized with max_retries=%s, timeout=%ss", max_retries, timeout) | ||
|
|
||
| def _build_graph(self): | ||
| """Single node: run LLM extraction and set state.statements.""" | ||
| workflow = StateGraph(ExtractorState) | ||
|
|
||
| async def extract_node(state: ExtractorState) -> dict: | ||
| content = (state.markdown_content or "").strip() | ||
| if not content: | ||
| return {"statements": []} | ||
| prompt = EXTRACTOR_PROMPT.format(markdown_content=content) | ||
| structured_llm = self.llm.with_structured_output(ExtractorOutput) | ||
| result = await structured_llm.ainvoke(prompt) | ||
| return {"statements": result.statements} | ||
|
|
||
| workflow.add_node("extract", extract_node) | ||
| workflow.add_edge(START, "extract") | ||
| workflow.add_edge("extract", END) | ||
| return workflow.compile() | ||
|
|
||
| async def execute(self, **kwargs: Any) -> AgentResult: | ||
| """Extract rule statements from markdown. Expects markdown_content=... in kwargs.""" | ||
| markdown_content = kwargs.get("markdown_content") or kwargs.get("content") or "" | ||
| if not isinstance(markdown_content, str): | ||
| markdown_content = str(markdown_content or "") | ||
|
|
||
| start_time = time.time() | ||
|
|
||
| if not markdown_content.strip(): | ||
| return AgentResult( | ||
| success=True, | ||
| message="Empty content", | ||
| data={"statements": []}, | ||
| metadata={"execution_time_ms": 0}, | ||
| ) | ||
|
|
||
| try: | ||
| logger.info("🚀 Extractor agent processing markdown (%s chars)", len(markdown_content)) | ||
| initial_state = ExtractorState(markdown_content=markdown_content) | ||
| result = await self._execute_with_timeout( | ||
| self.graph.ainvoke(initial_state), | ||
| timeout=self.timeout, | ||
| ) | ||
| if isinstance(result, dict): | ||
| statements = result.get("statements", []) | ||
| elif hasattr(result, "statements"): | ||
| statements = result.statements | ||
| else: | ||
| statements = [] | ||
| execution_time = time.time() - start_time | ||
| logger.info( | ||
| "✅ Extractor agent completed in %.2fs; extracted %s statements", | ||
| execution_time, | ||
| len(statements), | ||
| ) | ||
| return AgentResult( | ||
| success=True, | ||
| message="OK", | ||
| data={"statements": statements}, | ||
| metadata={"execution_time_ms": execution_time * 1000}, | ||
| ) | ||
| except TimeoutError: | ||
| execution_time = time.time() - start_time | ||
| logger.error("❌ Extractor agent timed out after %.2fs", execution_time) | ||
| return AgentResult( | ||
| success=False, | ||
| message=f"Extractor timed out after {self.timeout}s", | ||
| data={"statements": []}, | ||
| metadata={"execution_time_ms": execution_time * 1000, "error_type": "timeout"}, | ||
| ) | ||
| except Exception as e: | ||
| execution_time = time.time() - start_time | ||
| logger.exception("❌ Extractor agent failed: %s", e) | ||
| return AgentResult( | ||
| success=False, | ||
| message=str(e), | ||
| data={"statements": []}, | ||
| metadata={"execution_time_ms": execution_time * 1000, "error_type": type(e).__name__}, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| """ | ||
| Data models for the Rule Extractor Agent. | ||
| """ | ||
|
|
||
| from pydantic import BaseModel, Field | ||
|
|
||
|
|
||
| class ExtractorOutput(BaseModel): | ||
| """Structured output: list of rule-like statements extracted from markdown.""" | ||
|
|
||
| statements: list[str] = Field( | ||
| description="List of distinct rule-like statements extracted from the document. Each item is a single, clear sentence or phrase describing one rule or guideline.", | ||
| default_factory=list, | ||
| ) | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| """ | ||
| Prompt template for the Rule Extractor Agent. | ||
| """ | ||
|
|
||
| EXTRACTOR_PROMPT = """ | ||
| You are an expert at reading AI assistant guidelines and coding standards (e.g. Cursor rules, Claude instructions, Copilot guidelines, .cursorrules, repo rules). | ||
|
|
||
| Your task: read the following markdown document and extract every distinct **rule-like statement** or guideline. Treat the document holistically: rules may appear as: | ||
| - Bullet points or numbered lists | ||
| - Paragraphs or full sentences | ||
| - Section headings plus body text | ||
| - Implicit requirements (e.g. "PRs should be small" or "we use conventional commits") | ||
| - Explicit markers like "Rule:", "Instruction:", "Always", "Never", "Must", "Should" | ||
|
|
||
| For each rule you identify, output one clear, standalone statement (a single sentence or short phrase). Preserve the intent; normalize wording only if it helps clarity. Do not merge unrelated rules. If there are no rules or guidelines, return an empty list. | ||
|
|
||
| Markdown content: | ||
| --- | ||
| {markdown_content} | ||
| --- | ||
|
|
||
| Output the list of rule statements. Do not include explanations or numbering in the statements themselves. | ||
| """ | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.