Provider plugin for LangCore — access 100+ language models through a single, unified interface via LiteLLM.
langcore-litellm is a provider plugin for LangCore that adds support for 100+ language models through LiteLLM's unified API. Install it, prefix your model ID with litellm/, and every LangCore extraction call routes through LiteLLM transparently — auto-discovered via Python entry points.
- 100+ model support — OpenAI, Anthropic, Google, Azure, Mistral, Groq, Cohere, HuggingFace, Ollama, vLLM, and more through a single provider
- Native async — uses
litellm.acompletion()withasyncio.Semaphorefor true non-blocking concurrent I/O (no thread pool overhead) - Multi-pass cache bypass — automatic per-pass cache control ensures fresh LLM responses on repeat extraction passes while keeping the first pass cacheable
- Token usage tracking — captures prompt, completion, and total token counts (
UsageStats) from every inference call - Concurrency control — configurable
max_workerssemaphore limits parallel async requests to prevent rate-limit errors - Zero-config plugin — auto-registered via Python entry points; no manual wiring required
- Full parameter passthrough — forward any LiteLLM-supported parameter (temperature, top_p, timeout, etc.) through
provider_kwargs
pip install langcore-litellmOr install from source:
git clone https://github.com/JustStas/langcore-litellm
cd langcore-litellm
pip install -e .langcore-litellm integrates with LangCore through the provider plugin system. Create a model configuration, and LangCore handles the rest:
import langcore as lx
# Configure the LiteLLM provider
config = lx.factory.ModelConfig(
model_id="litellm/gpt-4o",
provider="LiteLLMLanguageModel",
)
model = lx.factory.create_model(config)
# Use with any LangCore extraction
result = lx.extract(
text_or_documents="Acme Corp agrees to pay $50,000 to Beta LLC by March 2025.",
model=model,
prompt_description="Extract parties, monetary amounts, and dates.",
examples=[
lx.data.ExampleData(
text="Alpha Inc will pay $10,000 to Omega Ltd by January 2024.",
extractions=[
lx.data.Extraction("party", "Alpha Inc", attributes={"role": "payer"}),
lx.data.Extraction("party", "Omega Ltd", attributes={"role": "payee"}),
lx.data.Extraction("monetary_amount", "$10,000"),
lx.data.Extraction("date", "January 2024", attributes={"type": "deadline"}),
],
)
],
)
print(result)Model IDs must be prefixed with litellm/ (or litellm-) to route through this provider:
| Provider | Example Model IDs |
|---|---|
| OpenAI | litellm/gpt-4o, litellm/gpt-4o-mini, litellm/gpt-4-turbo |
| Anthropic | litellm/claude-3-opus, litellm/claude-3.5-sonnet, litellm/claude-3-haiku |
litellm/gemini-2.5-pro, litellm/gemini-2.0-flash |
|
| Azure OpenAI | litellm/azure/your-deployment-name |
| Mistral | litellm/mistral-large-latest |
| Groq | litellm/groq/llama-3.1-70b |
| Ollama | litellm/ollama/llama3.1 |
| And 100+ more | See LiteLLM providers |
Set the appropriate API key for your provider:
# OpenAI
export OPENAI_API_KEY="sk-..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# Google (Gemini)
export GEMINI_API_KEY="..."
# Azure OpenAI
export AZURE_API_KEY="..."
export AZURE_API_BASE="https://your-resource.openai.azure.com/"
export AZURE_API_VERSION="2024-02-01"
# Ollama (local)
export OLLAMA_API_BASE="http://localhost:11434"See the LiteLLM documentation for the full list of provider-specific variables.
The provider uses litellm.acompletion() for native async, avoiding thread overhead:
import asyncio
import langcore as lx
config = lx.factory.ModelConfig(
model_id="litellm/gpt-4o",
provider="LiteLLMLanguageModel",
)
model = lx.factory.create_model(config)
async def main():
result = await lx.async_extract(
text_or_documents="Agreement between Acme Corp and Beta LLC...",
model=model,
prompt_description="Extract parties and obligations.",
examples=[...],
)
print(result)
asyncio.run(main())When running multiple extraction passes (extraction_passes > 1), the provider automatically manages cache behaviour:
| Pass | Cache Behaviour |
|---|---|
| 1st pass | Normal — may be served from cache |
| 2nd pass | Bypass — forces a fresh LLM response |
| 3rd+ passes | Bypass — forces a fresh LLM response |
This is fully automatic. LangCore threads a pass_num argument to the provider, which injects cache={"no-cache": True} for passes ≥ 1:
result = lx.extract(
text_or_documents="Contract text...",
model=model,
prompt_description="Extract all entities.",
examples=[...],
extraction_passes=3, # 3 passes, only the first is cacheable
)Forward any LiteLLM parameter through provider_kwargs:
config = lx.factory.ModelConfig(
model_id="litellm/gpt-4o",
provider="LiteLLMLanguageModel",
provider_kwargs={
"temperature": 0.2,
"max_tokens": 2000,
"top_p": 0.9,
"timeout": 60,
"api_key": "sk-...", # override env var
},
)
model = lx.factory.create_model(config)These parameters are consumed internally and are not forwarded to LiteLLM:
| Parameter | Type | Default | Description |
|---|---|---|---|
max_workers |
int |
10 |
Maximum concurrent async requests (semaphore size) |
pass_num |
int |
0 |
Current extraction pass index — set automatically by LangCore during multi-pass extraction |
langcore-litellm serves as the base provider that other LangCore plugins wrap. Stack decorators to add audit logging, output validation, or hybrid rule-based extraction:
import langcore as lx
from langcore_audit import AuditLanguageModel, LoggingSink
from langcore_guardrails import GuardrailLanguageModel, SchemaValidator, OnFailAction
# Base LLM provider
config = lx.factory.ModelConfig(
model_id="litellm/gpt-4o",
provider="LiteLLMLanguageModel",
)
llm = lx.factory.create_model(config)
# Add output validation
guarded = GuardrailLanguageModel(
model_id="guardrails/gpt-4o",
inner=llm,
validators=[SchemaValidator(MySchema, on_fail=OnFailAction.REASK)],
max_retries=3,
)
# Add audit logging
audited = AuditLanguageModel(
model_id="audit/gpt-4o",
inner=guarded,
sinks=[LoggingSink()],
)
# Use the full stack with LangCore
result = lx.extract(
text_or_documents="Contract text...",
model=audited,
prompt_description="Extract entities.",
examples=[...],
)Extractions return LangCore's standard AnnotatedDocument with precise character intervals:
AnnotatedDocument(
extractions=[
Extraction(
extraction_class='party',
extraction_text='Acme Corp',
char_interval=CharInterval(start_pos=0, end_pos=9),
alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>,
attributes={'role': 'payer'}
),
Extraction(
extraction_class='monetary_amount',
extraction_text='$50,000',
char_interval=CharInterval(start_pos=24, end_pos=31),
alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>,
attributes={}
),
],
text='Acme Corp agrees to pay $50,000 to Beta LLC by March 2025.'
)API failures are captured gracefully — no unhandled exceptions:
ScoredOutput(score=0.0, output="LiteLLM API error: [error details]")pip install -e . # Install in development mode
python test_plugin.py # Run tests
python -m build # Build package
twine upload dist/* # Publish to PyPI- Python ≥ 3.12
langcorelitellm≥ 1.81.13
Apache License 2.0 — see LICENSE for details.