Resolve Issue #158: Page-Memory PR4 modules, Excel table pathing, evidence renderer optimization (#159) by suguanYang · Pull Request #160 · Ontos-AI/knowhere

suguanYang · 2026-06-17T08:30:26Z

Summary

describe the change
describe any API, worker, deployment, or migration impact
link the related issue or task

Verification

list the commands you ran
list any manual API, worker, or local-dev checks you performed
note anything intentionally not tested

Deployment Notes

note new or changed environment variables
note database migrations, queue changes, storage changes, or release-order requirements
note backwards compatibility or rollback concerns

Checklist

Tests were added or updated when behavior changed
Public docs, examples, or OpenAPI contracts were updated when needed
Database migrations are idempotent and safe to deploy
Logs, errors, and validation paths avoid leaking secrets or user data
The pull request description explains any breaking or user-visible change

…dence renderer optimization (#159) * Enhance document agent with new page_locate stage and lazy loading of ProfileAgent. - Added "page_locate" to BudgetStage for improved budget management. - Introduced lazy loading for ProfileAgent to optimize imports in document_agent. - Updated tools initialization to include page_locate functionality. * feat: PR4 page-memory page mode — C1-C7 modules + GAP-1/2/3 fixes New page_memory modules: - C1 page_renderer: PNG + thumbnail + raw_text per page - C2 page_plan: rule-based vlm_lite/text_only/skip_tagging strategy - C3 page_tagger: VLM per-page annotation with JSON retry + blurry degradation - C6 page_section_mapper: skeleton × tagger → section_path (primary/spans/inherited) - C7 memory_service: unified page/shard_page builder via full C1-C7 pipeline shared-python GAP fixes: - GAP-1: zip_chunk_schema recognizes 'page' chunk type (no collapse to text) - GAP-2: zip_result_resources collects pages/ directory - GAP-3: zip_doc_navigation counts page_chunks in stats * refactor: align page chunk fields with V2 spec Field changes: - content = raw PyMuPDF text only (no [SUMMARY]/[RAW] markers) - summary = VLM or LLM-generated summary (metadata only) - keywords = VLM or summary-full LLM extracted (semicolon-separated) - kind = PageLabel.kind from Profile Agent (not plan.reason) - observed_titles = from C4 skeleton primary titles (not VLM) - Remove thumb_uri (only page_image_uri kept) - Remove status field (strategy_used covers quality info) Strategy changes: - text_only: calls existing summary-full LLM for summary+keywords - skip_tagging: preserves raw text content, marks EMPTY if blank - vlm_lite: outputs summary+keywords (no observed_titles) - Mapper no longer depends on PageTagResult * feat: enhance Excel table parsing with hierarchical pathing, subtable support, and optimized evidence rendering with configurable character limits * fix: resolve lint and type errors (unused imports, unused variable, optional ctx/last_verify narrowing)

Align page-memory and Excel parsing contract assertions with the behavior introduced by PR159 so the required CI test suite reflects the shipped schema. Co-authored-by: Cursor <cursoragent@cursor.com>

suguanYang · 2026-06-17T10:02:32Z

Opened follow-up PR #162 against main to address the two CodeQL comments on this PR. It removes the redundant selected_nodes assignment and the unread exception-path subtable_index assignment. PR #162 CI is green: Lint, Typecheck, Test, Gitleaks, and CodeQL all passed.

…l-comments Fix PR 160 CodeQL comments

github-advanced-security AI found potential problems Jun 17, 2026

View reviewed changes

Comment thread apps/worker/app/services/document_agent/structure/page_locate_agent.py Fixed

Comment thread apps/worker/app/services/document_parser/formats/excel/table_parser.py Fixed

EricNGOntos and others added 2 commits June 17, 2026 17:00

Fix PR159 contract tests. (#161)

ed55bd2

Align page-memory and Excel parsing contract assertions with the behavior introduced by PR159 so the required CI test suite reflects the shipped schema. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix CodeQL dead assignments

3853101

suguanYang mentioned this pull request Jun 17, 2026

Fix PR 160 CodeQL comments #162

Merged

Merge pull request #162 from Ontos-AI/fix/suguan/knowhere-pr160-codeq…

68d8fb5

…l-comments Fix PR 160 CodeQL comments

suguanYang merged commit 90f9d45 into staging Jun 17, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve Issue #158: Page-Memory PR4 modules, Excel table pathing, evidence renderer optimization (#159)#160

Resolve Issue #158: Page-Memory PR4 modules, Excel table pathing, evidence renderer optimization (#159)#160
suguanYang merged 4 commits into
stagingfrom
main

suguanYang commented Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

suguanYang commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

suguanYang commented Jun 17, 2026

Summary

Verification

Deployment Notes

Checklist

Uh oh!

Uh oh!

Uh oh!

suguanYang commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants