Resolve Issue #158: Page-Memory PR4 modules, Excel table pathing, evidence renderer optimization (#159)#160
Merged
Conversation
…dence renderer optimization (#159) * Enhance document agent with new page_locate stage and lazy loading of ProfileAgent. - Added "page_locate" to BudgetStage for improved budget management. - Introduced lazy loading for ProfileAgent to optimize imports in document_agent. - Updated tools initialization to include page_locate functionality. * feat: PR4 page-memory page mode — C1-C7 modules + GAP-1/2/3 fixes New page_memory modules: - C1 page_renderer: PNG + thumbnail + raw_text per page - C2 page_plan: rule-based vlm_lite/text_only/skip_tagging strategy - C3 page_tagger: VLM per-page annotation with JSON retry + blurry degradation - C6 page_section_mapper: skeleton × tagger → section_path (primary/spans/inherited) - C7 memory_service: unified page/shard_page builder via full C1-C7 pipeline shared-python GAP fixes: - GAP-1: zip_chunk_schema recognizes 'page' chunk type (no collapse to text) - GAP-2: zip_result_resources collects pages/ directory - GAP-3: zip_doc_navigation counts page_chunks in stats * refactor: align page chunk fields with V2 spec Field changes: - content = raw PyMuPDF text only (no [SUMMARY]/[RAW] markers) - summary = VLM or LLM-generated summary (metadata only) - keywords = VLM or summary-full LLM extracted (semicolon-separated) - kind = PageLabel.kind from Profile Agent (not plan.reason) - observed_titles = from C4 skeleton primary titles (not VLM) - Remove thumb_uri (only page_image_uri kept) - Remove status field (strategy_used covers quality info) Strategy changes: - text_only: calls existing summary-full LLM for summary+keywords - skip_tagging: preserves raw text content, marks EMPTY if blank - vlm_lite: outputs summary+keywords (no observed_titles) - Mapper no longer depends on PageTagResult * feat: enhance Excel table parsing with hierarchical pathing, subtable support, and optimized evidence rendering with configurable character limits * fix: resolve lint and type errors (unused imports, unused variable, optional ctx/last_verify narrowing)
Align page-memory and Excel parsing contract assertions with the behavior introduced by PR159 so the required CI test suite reflects the shipped schema. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
Author
…l-comments Fix PR 160 CodeQL comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Verification
Deployment Notes
Checklist