Skip to content

Conversation

@ww2283
Copy link
Contributor

@ww2283 ww2283 commented Nov 3, 2025

Enhanced Token Limit Handling with Dynamic Discovery + AST Metadata

This PR primarily adds on top of #152 to extend the metadata revealing when ast chunking is used, also builds on #154 by adding dynamic token limit discovery to provide comprehensive token truncation with zero-maintenance for Ollama deployments.

Key Enhancements

1. Dynamic Ollama Token Limit Discovery

  • Zero-maintenance: Auto-queries Ollama's /api/show endpoint for model context limits
  • Future-proof: Automatically adapts to new models without code changes if ollama is used
  • Fallback safety: Registry-based limits for LM Studio/OpenAI/unknown models for openai style api
# Detects limits automatically for Ollama
token_limit = get_model_token_limit("nomic-embed-text", base_url="http://localhost:11434")
# → Queries /api/show, returns actual context_length

2. Corrected Model Token Limits

  • Fixed nomic-embed-text: 2048 tokens (was incorrectly 512)
  • Verified via /api/show endpoint inspection
  • Added OpenAI embedding models (text-embedding-3-small/large: 8192)

3. AST Metadata Preservation

  • Chunking functions now return List[Dict[str, Any]] format
  • Preserves AST metadata (classes, functions, start_line, end_line)
  • Enables metadata display in search results
  • Backwards compatible with existing code

4. Comprehensive Test Coverage

  • 277 lines of token truncation unit tests
  • 553 lines of AST integration tests
  • Tests cover: known/unknown models, single/batch truncation, edge cases
  • Current results: 84/89 tests passing (5 pre-existing RAG environment failures)

Implementation Details

Files Changed:

  • embedding_compute.py: Dynamic discovery, enhanced truncation, improved logging
  • chunking_utils.py: Dict-based return format, AST metadata flow
  • cli.py: AST chunking parameters
  • test_token_truncation.py: Comprehensive unit tests (new)
  • test_astchunk_integration.py: AST metadata validation (new)

Testing

Exclude diskann tests (native C++ threading bugs on macOS):

uv run pytest -k "not (diskann or test_backend_options or test_large_index)"

Benefits

  1. Zero-maintenance for Ollama: No hardcoded limits to update
  2. Prevents silent truncation: Client-side control before server truncates
  3. Better debugging: Rich metadata for search result provenance
  4. Safer defaults: Conservative fallbacks (2048) for unknown models
  5. Comprehensive testing: Verified behavior across scenarios

Migration Notes

No breaking changes - enhancement is additive. Existing code continues working with static registry, Ollama users automatically benefit from dynamic discovery.

Checklist

  • Tests pass (uv run pytest)
  • Code formatted (ruff format and ruff check)
  • Pre-commit hooks pass (pre-commit run --all-files)

Improves upon upstream PR yichuan-w#154 with two major enhancements:

1. **Hybrid Token Limit Discovery**
   - Dynamic: Query Ollama /api/show for context limits
   - Fallback: Registry for LM Studio/OpenAI
   - Zero maintenance for Ollama users
   - Respects custom num_ctx settings

2. **AST Metadata Preservation**
   - create_ast_chunks() returns dict format with metadata
   - Preserves file_path, file_name, timestamps
   - Includes astchunk metadata (line numbers, node counts)
   - Fixes content extraction bug (checks "content" key)
   - Enables --show-metadata flag

3. **Better Token Limits**
   - nomic-embed-text: 2048 tokens (vs 512)
   - nomic-embed-text-v1.5: 2048 tokens
   - Added OpenAI models: 8192 tokens

4. **Comprehensive Tests**
   - 11 tests for token truncation
   - 545 new lines in test_astchunk_integration.py
   - All metadata preservation tests passing
- Merged upstream's model list with our corrected token limits
- Kept our corrected nomic-embed-text: 2048 (not 512)
- Removed post-chunking validation (redundant with embedding-time truncation)
- All tests passing except 2 pre-existing integration test failures
…dling

- Remove duplicate truncate_to_token_limit and get_model_token_limit functions
- Restore version handling logic (model:latest -> model) from PR yichuan-w#154
- Restore partial matching fallback for model name variations
- Apply ruff formatting to all modified files
- All 11 token truncation tests passing
@ww2283
Copy link
Contributor Author

ww2283 commented Nov 3, 2025

@ASuresh0524 and @yichuan-w : take a look at this pr and let me know. Btw where's the slack yichuan mentioned to me several days ago? I was not smart enough to find it to join...

- Add module-level flag to track if warning shown
- Prevents spam when processing multiple files
- Add clarifying note that auto-truncation happens at embedding time
- Addresses issue where warning appeared for every code file
@ASuresh0524
Copy link
Collaborator

hey @ww2283 will look at this by tomorrow night, sorry for the delay

@yichuan-w
Copy link
Owner

Oh the Slack
link is here
image

- Track and report truncation statistics (count, tokens removed, max length)
- Show first 3 individual truncations with exact token counts
- Provide comprehensive summary when truncation occurs
- Use WARNING level for data loss visibility
- Silent (DEBUG level only) when no truncation needed

Replaces misleading "truncated where necessary" message that appeared
even when nothing was truncated.
@ww2283 ww2283 force-pushed the pr154-conflict-resolution branch from 494637d to 24d2971 Compare November 4, 2025 14:23
@andylizf andylizf self-requested a review November 5, 2025 04:02
Copy link
Collaborator

@andylizf andylizf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

logger = logging.getLogger(__name__)

# Flag to ensure AST token warning only shown once per session
_ast_token_warning_shown = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's use warnings.filterwarnings. it's thread-safe.

@andylizf
Copy link
Collaborator

andylizf commented Nov 5, 2025

Thanks @ww2283 for the awesome PR!

@ASuresh0524
Copy link
Collaborator

LGTM as well! @yichuan-w @ww2283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants