Skip to content

Align extraction agent with structured Modelfile JSON contract#5

Merged
nmdra merged 7 commits into
mainfrom
copilot/init-chat-ollama-model
Apr 21, 2026
Merged

Align extraction agent with structured Modelfile JSON contract#5
nmdra merged 7 commits into
mainfrom
copilot/init-chat-ollama-model

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

  • Review requested PR thread feedback and inspect target files
  • Verify current implementation in extraction schema and retry logic
  • Enforce strict extraction schema by forbidding extra keys (top-level and nested)
  • Align extraction prompt contract with nullable nested fields
  • Remove fragile stop token behavior and safely parse fenced JSON if present
  • Run targeted extraction tests and full validation (ruff, mypy, pytest)
  • Commit and reply to the PR comment with commit hash

@nmdra nmdra marked this pull request as ready for review April 21, 2026 15:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns the extraction agent’s output with a more structured JSON contract (nested experience/education, added website/other_details), while keeping downstream agents mocked for the Phase 1 scaffold.

Changes:

  • Updates extraction validation schema to structured experience/education entries plus website and other_details.
  • Refactors extraction prompting to use a persona-driven, sectioned prompt layout.
  • Adjusts extraction + workflow tests to assert the new extraction JSON shape.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_workflow.py Updates stubbed extraction payload in workflow test to match the new extraction schema.
tests/test_extraction_agent.py Updates extraction agent tests to expect structured experience/education and new optional fields.
app/tools/validate_extraction.py Introduces structured Pydantic models for experience/education and extends extraction schema.
app/tools/ollama.py Adds optional sampling/stop controls to JSON generation requests via Ollama options.
app/agents/personas.py Adds persona spec + helper to build structured prompts for the extraction agent.
app/agents/extraction_agent.py Refactors prompt generation to persona-based format and updates Ollama generation parameters.
README.md Updates workflow documentation to remove human review and model settings that no longer exist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 24 to +34
class CandidateExtraction(BaseModel):
"""Structured extraction output from extraction agent."""

name: str | None = Field(default=None)
email: str | None = Field(default=None)
phone: str | None = Field(default=None)
website: str | None = Field(default=None)
skills: list[str] = Field(default_factory=list)
experience: str | None = Field(default=None)
education: str | None = Field(default=None)
experience: list[ExperienceEntry] = Field(default_factory=list)
education: list[EducationEntry] = Field(default_factory=list)
other_details: list[str] = Field(default_factory=list)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CandidateExtraction currently accepts and silently ignores unexpected/hallucinated keys because no Pydantic extra policy is set. Since the prompt/persona contract says the JSON must match the schema exactly, consider forbidding extra keys (and similarly for nested ExperienceEntry/EducationEntry) so validation fails when the model returns additional fields instead of dropping them unnoticed.

Copilot uses AI. Check for mistakes.
Comment thread app/agents/extraction_agent.py Outdated
Comment on lines +73 to 77
prompt=_build_extraction_prompt(raw_text, correction_error=error),
temperature=0.0,
top_p=0.1,
stop=["```"],
timeout_seconds=timeout_seconds,
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing stop=["```"] can truncate responses that contain fenced JSON (e.g., starting with ```json), producing an empty/partial string that will always fail json.loads and consume retries. Consider removing the stop token and instead stripping markdown fences in post-processing, or choosing stop tokens that only match trailing fences so the JSON body is preserved.

Copilot uses AI. Check for mistakes.
@nmdra
Copy link
Copy Markdown
Owner

nmdra commented Apr 21, 2026

@copilot apply changes based on the comments in this thread

Copilot stopped work on behalf of nmdra due to an error April 21, 2026 17:40
@nmdra nmdra merged commit c3d660e into main Apr 21, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants