Skip to content

Latest commit

 

History

History
692 lines (533 loc) · 17 KB

File metadata and controls

692 lines (533 loc) · 17 KB

Unified LLM Provider Infrastructure

This document describes the unified LLM provider infrastructure in lucidRAG, which provides a consistent interface for working with multiple LLM backends (OpenAI, Anthropic, Ollama, LMStudio) through YAML-based configuration.

Overview

The LucidRAG.LLM project provides:

  • Named Providers: Semantic aliases like fast-local, general, smart, vision
  • YAML Configuration: Define backends, models, and providers in llm-providers.yaml
  • Named Prompt Library: Reusable prompts with variable substitution in prompts.yaml
  • Polly Resilience: Automatic retry with exponential backoff and circuit breaker
  • OpenTelemetry Observability: Distributed tracing and metrics for monitoring

Quick Start

Basic Usage

// Inject the factory
public class MyService
{
    private readonly ILlmProviderFactory _llmFactory;

    public MyService(ILlmProviderFactory llmFactory)
    {
        _llmFactory = llmFactory;
    }

    public async Task<string> GenerateAsync(string prompt)
    {
        // Get provider by tier
        var provider = _llmFactory.GetProviderForTier("general");
        return await provider.GenerateAsync(prompt);
    }
}

Using Named Prompts

// Use a named prompt from prompts.yaml
var provider = _llmFactory.GetProviderForTier("general");
var result = await provider.GenerateWithPromptAsync(
    "rag_synthesis",
    new Dictionary<string, object>
    {
        ["segments"] = "Context from retrieved documents...",
        ["question"] = "What is the main topic?"
    });

Generating JSON

// Get structured JSON output
var provider = _llmFactory.GetProviderForTier("triage");
var result = await provider.GenerateJsonWithPromptAsync<QueryDecomposition>(
    "query_decomposition",
    new Dictionary<string, object>
    {
        ["query"] = "How does authentication work in the system?",
        ["context"] = "User is asking about security features"
    });

public class QueryDecomposition
{
    public List<SubQuery> SubQueries { get; set; } = new();
}

public class SubQuery
{
    public string Query { get; set; } = "";
    public string Purpose { get; set; } = "";
    public int Priority { get; set; }
}

Configuration

llm-providers.yaml

Located at src/LucidRAG/Config/llm-providers.yaml:

# Backend definitions - connection settings for each LLM service
backends:
  ollama-local:
    type: ollama
    base_url: http://localhost:11434
    timeout_seconds: 120
    enabled: true
    # Resilience settings
    max_retries: 3
    initial_retry_delay_ms: 500
    max_retry_delay_ms: 30000
    circuit_breaker_threshold: 5
    circuit_breaker_duration_seconds: 30

  anthropic:
    type: anthropic
    base_url: https://api.anthropic.com
    api_key: ${ANTHROPIC_API_KEY}  # Environment variable substitution
    api_version: "2023-06-01"
    timeout_seconds: 120
    enabled: true
    max_retries: 3
    initial_retry_delay_ms: 1000
    max_retry_delay_ms: 60000

  openai:
    type: openai
    base_url: https://api.openai.com/v1
    api_key: ${OPENAI_API_KEY}
    timeout_seconds: 120
    enabled: true

  lmstudio:
    type: openai  # LMStudio uses OpenAI-compatible API
    base_url: http://localhost:1234/v1
    api_key: ""
    timeout_seconds: 120
    enabled: false

# Model registry - define models once, reference by name
models:
  claude-opus:
    backend: anthropic
    model: claude-3-5-opus-latest
    context_window: 200000
    max_tokens: 8192
    temperature: 0.3

  claude-sonnet:
    backend: anthropic
    model: claude-3-5-sonnet-latest
    context_window: 200000
    max_tokens: 8192
    temperature: 0.3

  claude-haiku:
    backend: anthropic
    model: claude-3-5-haiku-latest
    context_window: 200000
    max_tokens: 4096
    temperature: 0.3

  gpt-4o:
    backend: openai
    model: gpt-4o
    context_window: 128000
    max_tokens: 4096
    temperature: 0.3
    capabilities:
      - vision

  gpt-4o-mini:
    backend: openai
    model: gpt-4o-mini
    context_window: 128000
    max_tokens: 4096
    temperature: 0.3

  qwen:
    backend: ollama-local
    model: qwen2.5:3b
    context_window: 8192
    max_tokens: 4096
    temperature: 0.3

  llama3:
    backend: ollama-local
    model: llama3.2:3b
    context_window: 8192
    max_tokens: 4096
    temperature: 0.3

  tinyllama:
    backend: ollama-local
    model: tinyllama
    context_window: 2048
    max_tokens: 1024
    temperature: 0.3

  minicpm-v:
    backend: ollama-local
    model: minicpm-v:8b
    context_window: 8192
    max_tokens: 4096
    capabilities:
      - vision

# Named providers - semantic aliases for specific use cases
providers:
  fast-local:
    model: tinyllama
    description: "Fast local model for triage and classification"

  local:
    model: qwen
    fallback: llama3
    description: "Local model for general tasks"

  general:
    model: claude-sonnet
    fallback: gpt-4o-mini
    description: "Balanced cost/quality for general tasks"

  smart:
    model: claude-opus
    fallback: gpt-4o
    description: "High quality for complex reasoning"

  vision:
    model: minicpm-v
    fallback: gpt-4o
    description: "Vision-capable model for image analysis"

  budget-cloud:
    model: claude-haiku
    fallback: gpt-4o-mini
    description: "Cost-effective cloud inference"

# Default provider assignments by task tier
defaults:
  triage: fast-local      # Quick classification, sentinel queries
  general: general        # Standard RAG queries, summarization
  synthesis: smart        # Complex synthesis, agentic tasks
  vision: vision          # Image analysis
  fallback: budget-cloud  # When primary fails

Environment Variables

API keys can be specified using ${ENV_VAR} syntax:

api_key: ${ANTHROPIC_API_KEY}

Set the environment variable before running:

# Windows
set ANTHROPIC_API_KEY=sk-ant-...

# Linux/macOS
export ANTHROPIC_API_KEY=sk-ant-...

Named Prompt Library (prompts.yaml)

Located at src/LucidRAG/Config/prompts.yaml:

Available Prompts

Prompt Name Purpose JSON Output Provider
query_decomposition Decompose complex queries into sub-queries Yes Any
rag_synthesis Synthesize answers from retrieved segments No Any
entity_extraction Extract entities and relationships for GraphRAG Yes Any
document_classification Classify document type and characteristics Yes fast-local
image_caption Generate WCAG-compliant alt-text captions No vision
query_clarification Clarify ambiguous user queries Yes Any
summary Generate document summaries No Any

Prompt Structure

prompts:
  query_decomposition:
    name: query_decomposition
    description: "Decompose complex queries into sub-queries"
    version: 1
    json_output: true

    system: |
      You are a query analyst. Your task is to decompose complex queries into simpler,
      focused sub-queries that can be independently answered and then merged to form
      a comprehensive answer.

    template: |
      Query: {query}

      Additional context:
      {context}

      Decompose this query into 2-5 focused sub-queries.
      Return JSON in this format:
      {
        "sub_queries": [
          {"query": "...", "purpose": "...", "priority": 1}
        ]
      }

    overrides:
      anthropic:
        max_tokens: 2048
      ollama:
        temperature: 0.2

Template Variables

Use {variable_name} syntax in templates. Variables are substituted at runtime:

var result = await provider.GenerateWithPromptAsync(
    "rag_synthesis",
    new Dictionary<string, object>
    {
        ["segments"] = formattedSegments,
        ["question"] = userQuery
    });

For a complete reference of all available variables by task type, see PROMPT_TEMPLATE_VARIABLES.md.

Provider-Specific Overrides

Each prompt can override settings per backend:

overrides:
  anthropic:
    max_tokens: 4096
  openai:
    max_tokens: 4096
  ollama:
    temperature: 0.1
    max_tokens: 2048

Provider Tiers

The system supports tiered provider selection for different use cases:

Tier Use Case Default Provider
triage Quick classification, sentinel queries fast-local (tinyllama)
general Standard RAG queries, summarization general (claude-sonnet)
synthesis Complex synthesis, agentic tasks smart (claude-opus)
vision Image analysis, OCR verification vision (minicpm-v)
fallback When primary provider fails budget-cloud (claude-haiku)

Using Tiers

// By tier enum
var provider = _llmFactory.GetProviderForTier(ProviderTier.General);

// By tier string
var provider = _llmFactory.GetProviderForTier("synthesis");

// Get specific named provider
var provider = _llmFactory.GetProvider("fast-local");

// Get default (general tier)
var provider = _llmFactory.GetDefault();

Resilience (Polly)

All providers include automatic resilience via Polly:

Retry Policy

  • Exponential backoff with jitter
  • Configurable per-backend in YAML:
    • max_retries: Maximum retry attempts (default: 3)
    • initial_retry_delay_ms: Initial delay before first retry (default: 500ms)
    • max_retry_delay_ms: Maximum delay between retries (default: 30000ms)

Circuit Breaker

Prevents overwhelming failing backends:

  • Opens after circuit_breaker_threshold failures (default: 5)
  • Stays open for circuit_breaker_duration_seconds (default: 30s)
  • Automatically tests and closes when backend recovers

Handled Exceptions

  • HttpRequestException - Network failures
  • TaskCanceledException - Timeouts
  • TimeoutException - Request timeouts
  • InvalidOperationException - Service unavailable, rate limits

Example Configuration

backends:
  anthropic:
    # ... connection settings ...

    # Resilience for cloud APIs (longer delays for rate limits)
    max_retries: 3
    initial_retry_delay_ms: 1000
    max_retry_delay_ms: 60000
    circuit_breaker_threshold: 5
    circuit_breaker_duration_seconds: 30

Observability (OpenTelemetry)

The provider infrastructure emits OpenTelemetry signals for monitoring.

Activity Tracing

ActivitySource: LucidRAG.LLM

Activity Name Description Tags
LlmGenerate Text generation provider, model, backend, prompt_length, response_length
LlmGenerateJson JSON generation provider, model, return_type
LlmGenerateWithPrompt Named prompt generation provider, prompt_name
LlmGenerateJsonWithPrompt Named prompt JSON generation provider, prompt_name, return_type
LlmAvailabilityCheck Provider health check provider, available

Metrics

Meter: LucidRAG.LLM

Metric Type Description
lucidrag.llm.requests Counter Total generation requests (by provider, status)
lucidrag.llm.json_requests Counter Total JSON generation requests
lucidrag.llm.named_prompts Counter Named prompt usage (by provider, prompt)
lucidrag.llm.errors Counter Errors by type (by provider, error_type)
lucidrag.llm.retries Counter Retry attempts (by provider, attempt)
lucidrag.llm.circuit_breaker Counter Circuit breaker state changes (by provider, state)
lucidrag.llm.duration Histogram Request duration in milliseconds
lucidrag.llm.prompt_tokens Histogram Estimated prompt tokens
lucidrag.llm.response_tokens Histogram Estimated response tokens

Integration with OpenTelemetry

// In Program.cs
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("LucidRAG.LLM")
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddMeter("LucidRAG.LLM")
        .AddOtlpExporter());

Service Registration

DI Registration

In Program.cs:

using LucidRAG.LLM.Extensions;

// Register with YAML file paths
builder.Services.AddLucidRagLlm(
    Path.Combine(AppContext.BaseDirectory, "Config", "llm-providers.yaml"),
    Path.Combine(AppContext.BaseDirectory, "Config", "prompts.yaml"));

// Or use default Config directory
builder.Services.AddLucidRagLlm("Config");

// Or bind from IConfiguration (appsettings.json)
builder.Services.AddLucidRagLlm(builder.Configuration);

Registered Services

Interface Implementation Lifetime
ILlmProviderFactory LlmProviderFactory Singleton
IPromptService PromptService Singleton
ILlmService Default provider Singleton

Interfaces

INamedLlmProvider

Extends ILlmService with named provider features:

public interface INamedLlmProvider : ILlmService
{
    string Name { get; }
    LlmBackendType BackendType { get; }
    string ModelId { get; }
    bool SupportsVision { get; }

    Task<string> GenerateWithPromptAsync(
        string promptName,
        Dictionary<string, object> variables,
        LlmOptions? options = null,
        CancellationToken ct = default);

    Task<T?> GenerateJsonWithPromptAsync<T>(
        string promptName,
        Dictionary<string, object> variables,
        LlmOptions? options = null,
        CancellationToken ct = default) where T : class;
}

ILlmProviderFactory

Factory for resolving named providers:

public interface ILlmProviderFactory
{
    INamedLlmProvider GetProvider(string name);
    bool TryGetProvider(string name, out INamedLlmProvider? provider);
    INamedLlmProvider GetProviderForTier(ProviderTier tier);
    INamedLlmProvider GetProviderForTier(string tierName);
    IReadOnlyList<string> GetProviderNames();
    bool HasProvider(string name);
    INamedLlmProvider GetDefault();
}

IPromptService

Prompt resolution and rendering:

public interface IPromptService
{
    PromptDefinition? GetPrompt(string name);
    (string? systemPrompt, string template) RenderPrompt(
        string promptName,
        Dictionary<string, object> variables,
        LlmBackendType backend);
    LlmOptions GetOptionsForPrompt(string promptName, LlmBackendType backend);
    IReadOnlyList<string> GetPromptNames();
}

Adding Custom Prompts

Add new prompts to prompts.yaml:

prompts:
  my_custom_prompt:
    name: my_custom_prompt
    description: "My custom prompt for specific task"
    version: 1
    json_output: false

    system: |
      You are a helpful assistant specialized in...

    template: |
      Input: {input}

      Please analyze the above and provide...

    overrides:
      anthropic:
        max_tokens: 2048
        temperature: 0.5

Then use it in code:

var result = await provider.GenerateWithPromptAsync(
    "my_custom_prompt",
    new Dictionary<string, object>
    {
        ["input"] = myInputData
    });

Adding Custom Backends

Add new backends to llm-providers.yaml:

backends:
  my-custom-ollama:
    type: ollama
    base_url: http://my-gpu-server:11434
    timeout_seconds: 300
    enabled: true
    max_retries: 5

models:
  my-custom-model:
    backend: my-custom-ollama
    model: mixtral:8x7b
    context_window: 32000
    max_tokens: 4096
    temperature: 0.3

providers:
  my-custom-provider:
    model: my-custom-model
    description: "My custom Mixtral provider"

Backward Compatibility

The infrastructure maintains backward compatibility with existing ILlmService usage:

// This still works - returns the default provider
public class LegacyService
{
    private readonly ILlmService _llmService;

    public LegacyService(ILlmService llmService)
    {
        _llmService = llmService;
    }

    public async Task<string> Generate(string prompt)
    {
        return await _llmService.GenerateAsync(prompt);
    }
}

Related Documentation


Troubleshooting

Provider Not Found

KeyNotFoundException: LLM provider 'xyz' not found. Available: fast-local, general, smart, vision

Solution: Check that the provider is defined in llm-providers.yaml and its backend is enabled.

Circuit Breaker Open

InvalidOperationException: Provider 'general' is temporarily unavailable (circuit breaker open)

Solution: The backend has failed consistently. Wait for the circuit breaker duration to expire, or check the backend's health.

Missing API Key

Warning: Anthropic service not registered in DI, skipping

Solution: Ensure the environment variable (e.g., ANTHROPIC_API_KEY) is set.

YAML Parse Errors

Failed to load YAML config from Config/llm-providers.yaml

Solution: Validate YAML syntax. Common issues:

  • Incorrect indentation (use spaces, not tabs)
  • Missing colons after keys
  • Unquoted special characters