Align extraction agent with structured Modelfile JSON contract#5
Conversation
Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/3d45db37-3bcc-450a-a084-49377e4795c8 Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/88460f99-0014-43cc-bdb8-7d47c16a02db Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
…am agents Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/09a8ad7f-3dd2-49a1-8c0d-d0de3f8c4748 Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Aligns the extraction agent’s output with a more structured JSON contract (nested experience/education, added website/other_details), while keeping downstream agents mocked for the Phase 1 scaffold.
Changes:
- Updates extraction validation schema to structured
experience/educationentries pluswebsiteandother_details. - Refactors extraction prompting to use a persona-driven, sectioned prompt layout.
- Adjusts extraction + workflow tests to assert the new extraction JSON shape.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_workflow.py | Updates stubbed extraction payload in workflow test to match the new extraction schema. |
| tests/test_extraction_agent.py | Updates extraction agent tests to expect structured experience/education and new optional fields. |
| app/tools/validate_extraction.py | Introduces structured Pydantic models for experience/education and extends extraction schema. |
| app/tools/ollama.py | Adds optional sampling/stop controls to JSON generation requests via Ollama options. |
| app/agents/personas.py | Adds persona spec + helper to build structured prompts for the extraction agent. |
| app/agents/extraction_agent.py | Refactors prompt generation to persona-based format and updates Ollama generation parameters. |
| README.md | Updates workflow documentation to remove human review and model settings that no longer exist. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class CandidateExtraction(BaseModel): | ||
| """Structured extraction output from extraction agent.""" | ||
|
|
||
| name: str | None = Field(default=None) | ||
| email: str | None = Field(default=None) | ||
| phone: str | None = Field(default=None) | ||
| website: str | None = Field(default=None) | ||
| skills: list[str] = Field(default_factory=list) | ||
| experience: str | None = Field(default=None) | ||
| education: str | None = Field(default=None) | ||
| experience: list[ExperienceEntry] = Field(default_factory=list) | ||
| education: list[EducationEntry] = Field(default_factory=list) | ||
| other_details: list[str] = Field(default_factory=list) |
There was a problem hiding this comment.
CandidateExtraction currently accepts and silently ignores unexpected/hallucinated keys because no Pydantic extra policy is set. Since the prompt/persona contract says the JSON must match the schema exactly, consider forbidding extra keys (and similarly for nested ExperienceEntry/EducationEntry) so validation fails when the model returns additional fields instead of dropping them unnoticed.
| prompt=_build_extraction_prompt(raw_text, correction_error=error), | ||
| temperature=0.0, | ||
| top_p=0.1, | ||
| stop=["```"], | ||
| timeout_seconds=timeout_seconds, |
There was a problem hiding this comment.
Passing stop=["```"] can truncate responses that contain fenced JSON (e.g., starting with ```json), producing an empty/partial string that will always fail json.loads and consume retries. Consider removing the stop token and instead stripping markdown fences in post-processing, or choosing stop tokens that only match trailing fences so the JSON body is preserved.
Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/9aa47908-a01a-4b1e-a3f4-ec74a9327676 Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/db48781c-bc90-4b92-85c8-17294dc3126b Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/db48781c-bc90-4b92-85c8-17294dc3126b Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
Agent-Logs-Url: https://github.com/nmdra/AgentHire/sessions/db48781c-bc90-4b92-85c8-17294dc3126b Co-authored-by: nmdra <73674803+nmdra@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.