Skip to content

Latest commit

 

History

History
94 lines (64 loc) · 1.96 KB

File metadata and controls

94 lines (64 loc) · 1.96 KB

Document Processing

Automatic Document Processing

# Context is automatically enabled during document processing
await rag_anything.process_document_complete("document.pdf")

3. Manual Content Source Configuration

# Set content source for specific content lists
rag_anything.set_content_source_for_context(content_list, "minerU")

# Update context configuration at runtime
rag_anything.update_context_config(
    context_window=1,
    max_context_tokens=1500,
    include_captions=False
)

4. Direct Modal Processor Usage

from raganything.modalprocessors import (
    ContextExtractor,
    ContextConfig,
    ImageModalProcessor
)

# Configure context extraction
config = ContextConfig(
    context_window=1,
    context_mode="page",
    max_context_tokens=2000,
    include_headers=True,
    include_captions=True,
    filter_content_types=["text"]
)

# Initialize context extractor
context_extractor = ContextExtractor(config)

# Initialize modal processor with context support
processor = ImageModalProcessor(lightrag, caption_func, context_extractor)

# Set content source
processor.set_content_source(content_list, "minerU")

# Process with context
item_info = {
    "page_idx": 2,
    "index": 5,
    "type": "image"
}

result = await processor.process_multimodal_content(
    modal_content=image_data,
    content_type="image",
    file_path="document.pdf",
    entity_name="Architecture Diagram",
    item_info=item_info
)

Context Modes

Page-Based Context (context_mode="page")

  • Extracts context based on page boundaries
  • Uses page_idx field from content items
  • Suitable for document-structured content
  • Example: Include text from 2 pages before and after current image

Chunk-Based Context (context_mode="chunk")

  • Extracts context based on content item positions
  • Uses sequential position in content list
  • Suitable for fine-grained control
  • Example: Include 5 content items before and after current table