Skip to content

Resolve Issue #158: Page-Memory PR4 modules, Excel table pathing, evidence renderer optimization (#159)#160

Merged
suguanYang merged 4 commits into
stagingfrom
main
Jun 17, 2026
Merged

Resolve Issue #158: Page-Memory PR4 modules, Excel table pathing, evidence renderer optimization (#159)#160
suguanYang merged 4 commits into
stagingfrom
main

Conversation

@suguanYang

Copy link
Copy Markdown
Contributor

Summary

  • describe the change
  • describe any API, worker, deployment, or migration impact
  • link the related issue or task

Verification

  • list the commands you ran
  • list any manual API, worker, or local-dev checks you performed
  • note anything intentionally not tested

Deployment Notes

  • note new or changed environment variables
  • note database migrations, queue changes, storage changes, or release-order requirements
  • note backwards compatibility or rollback concerns

Checklist

  • Tests were added or updated when behavior changed
  • Public docs, examples, or OpenAPI contracts were updated when needed
  • Database migrations are idempotent and safe to deploy
  • Logs, errors, and validation paths avoid leaking secrets or user data
  • The pull request description explains any breaking or user-visible change

…dence renderer optimization (#159)

* Enhance document agent with new page_locate stage and lazy loading of ProfileAgent.

- Added "page_locate" to BudgetStage for improved budget management.
- Introduced lazy loading for ProfileAgent to optimize imports in document_agent.
- Updated tools initialization to include page_locate functionality.

* feat: PR4 page-memory page mode — C1-C7 modules + GAP-1/2/3 fixes

New page_memory modules:
- C1 page_renderer: PNG + thumbnail + raw_text per page
- C2 page_plan: rule-based vlm_lite/text_only/skip_tagging strategy
- C3 page_tagger: VLM per-page annotation with JSON retry + blurry degradation
- C6 page_section_mapper: skeleton × tagger → section_path (primary/spans/inherited)
- C7 memory_service: unified page/shard_page builder via full C1-C7 pipeline

shared-python GAP fixes:
- GAP-1: zip_chunk_schema recognizes 'page' chunk type (no collapse to text)
- GAP-2: zip_result_resources collects pages/ directory
- GAP-3: zip_doc_navigation counts page_chunks in stats

* refactor: align page chunk fields with V2 spec

Field changes:
- content = raw PyMuPDF text only (no [SUMMARY]/[RAW] markers)
- summary = VLM or LLM-generated summary (metadata only)
- keywords = VLM or summary-full LLM extracted (semicolon-separated)
- kind = PageLabel.kind from Profile Agent (not plan.reason)
- observed_titles = from C4 skeleton primary titles (not VLM)
- Remove thumb_uri (only page_image_uri kept)
- Remove status field (strategy_used covers quality info)

Strategy changes:
- text_only: calls existing summary-full LLM for summary+keywords
- skip_tagging: preserves raw text content, marks EMPTY if blank
- vlm_lite: outputs summary+keywords (no observed_titles)
- Mapper no longer depends on PageTagResult

* feat: enhance Excel table parsing with hierarchical pathing, subtable support, and optimized evidence rendering with configurable character limits

* fix: resolve lint and type errors (unused imports, unused variable, optional ctx/last_verify narrowing)
EricNGOntos and others added 2 commits June 17, 2026 17:00
Align page-memory and Excel parsing contract assertions with the behavior introduced by PR159 so the required CI test suite reflects the shipped schema.

Co-authored-by: Cursor <cursoragent@cursor.com>
@suguanYang

Copy link
Copy Markdown
Contributor Author

Opened follow-up PR #162 against main to address the two CodeQL comments on this PR. It removes the redundant selected_nodes assignment and the unread exception-path subtable_index assignment. PR #162 CI is green: Lint, Typecheck, Test, Gitleaks, and CodeQL all passed.

@suguanYang suguanYang merged commit 90f9d45 into staging Jun 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants