Contributors
This repository is in collaboration with the following early users, contributors, and reviewers:
Jared Quincy DavisF,S, Marquita EllisI, Diana ArroyoI, Pravein Govindan KannanI, Paul CastroI, Siddharth SharmaF,S, Lingjiao ChenMS, Omar KhattabD,MT, Alan ZhuB, Parth AsawaB, Connor ChowB, Jason LeeB, Jay Adityanag TipirneniB, Chad FergusonB, Kathleen GeB, Kunal AgrawalB, Rishab BhatiaB, Rohan PenmatchaB, Sai KolasaniB, Théo Jaffrelot InizanB, Deepak NarayananN, Long FeiF, Aparajit RaghavanF, Eyal CidonF, Jacob ScheinF, Prasanth SomasundarF, Boris HaninF,P, James ZouS, Alex DimakisB, Joey GonzalezB, Peter BailisG,S, Ion StoicaA,B,D, Matei ZahariaD,B
F Foundry (MLFoundry), D Databricks, I IBM Research, S Stanford University, B UC Berkeley, MT MIT, N NVIDIA, MS Microsoft, A Anyscale, G Google, P Princeton
Aspirationally, Ember is to Networks of Networks (NONs) Compound AI Systems development what PyTorch and XLA are to Neural Networks (NN) development. It's a compositional framework with both eager execution affordances and graph execution optimization capabilities. It enables users to compose complex NONs, and supports automatic parallelization and optimization of these.
Ember's vision is to enable development of compound AI systems composed of, one day, millions-billions of inference calls and beyond. Simple constructs--like best-of-N graphs, verifier-prover structures, and ensembles with “voting-based” aggregation--work surprisingly well in many regimes.
# With Ember's "compact notation" it is one line to build a simple parallel system with 101 GPT-4o instances synthesized by Claude
system = non.build_graph(["101:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]) # Automatically parallelized
result = system(query="What's the most effective climate change solution?")
This led us to believe that there is a rich architecture space for constructing and optimizing what we call “networks of networks” graphs, or NONs. This is analogous to how neural network architecture research uncovered many emergent properties of systems composed of simple artificial neurons. It would be frictionful to conduct NN research if we had to implement architectures from scratch via for-loops or implement bespoke libraries for vectorization and efficient execution. Similarly, it can be challenging at present to compose NON architectures of many calls, despite the rapidly falling cost-per-token of intelligence.
Ember's goal is to help unlock research and practice along this new frontier.
- Architecture Overview
- Quick Start Guide
- LLM Specifications
- Model Registry Guide
- Operators Guide
- NON Patterns
- Data Processing
- Configuration
- Examples Directory
class QueryInput(EmberModel):
query: str
class ConfidenceOutput(EmberModel):
answer: str
confidence: float
class ReasonerSpec(Specification):
input_model = QueryInput
structured_output = ConfidenceOutput
@jit # Autonomically optimize execution with JIT compilation (e.g. TopoSort with Parallel Dispatch)
class EnsembleReasoner(Operator[QueryInput, ConfidenceOutput]):
specification = ReasonerSpec()
def __init__(self, width: int = 3):
self.ensemble = non.UniformEnsemble(
num_units=width,
model_name="openai:gpt-4o",
temperature=0.7
)
self.judge = non.JudgeSynthesis(
model_name="anthropic:claude-3-5-sonnet",
)
def forward(self, *, inputs: QueryInput) -> ReasonedOutput:
# These operations are automatically parallelized by Ember's XCS system
ensemble_result = self.ensemble(query=inputs.query)
synthesis = self.judge(
query=inputs.query,
responses=ensemble_result["responses"]
)
return ConfidenceOutput(
answer=synthesis["final_answer"],
confidence=float(synthesis.get("confidence", 0.0))
)
# Use it like any Python function
compound_system = EnsembleReasoner()
result = compound_system(query="What causes the northern lights?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2f}")
# Alternatively, build the same pipeline with compact notation
pipeline = non.build_graph(["3:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.2"])
result = pipeline(query="What causes the northern lights?")
Ember's compact notation allows expression of complex AI architectures in minimal code:
# Compact notation: "count:type:model:temperature" - each component precisely specified
# BASIC: Single-line systems with automatic parallelization
basic = non.build_graph(["7:E:gpt-4o:0.7"]) # 7-model ensemble
voting = non.build_graph(["7:E:gpt-4o:0.7", "1:M"]) # With majority voting
judged = non.build_graph(["7:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]) # With judge synthesis
# STANDARD API: Equivalent to compact notation but with explicit objects
standard_system = non.Sequential(operators=[
non.UniformEnsemble(num_units=7, model_name="gpt-4o", temperature=0.7),
non.JudgeSynthesis(model_name="claude-3-5-sonnet", temperature=0.0)
])
# ADVANCED: Reusable components for complex architectures
components = {
# Define building blocks once, reuse everywhere
"reasoning": ["3:E:gpt-4o:0.7", "1:V:gpt-4o:0.0"], # Verification pipeline
"research": ["3:E:claude-3-5-sonnet:0.5", "1:V:claude-3-5-sonnet:0.0"] # Different models
}
# Build sophisticated multi-branch architecture in just 4 lines
advanced = non.build_graph([
"$reasoning", # First branch: reasoning with verification
"$research", # Second branch: research with verification
"1:J:claude-3-5-opus:0.0" # Final synthesis of both branches
], components=components) # Automatically optimized for parallel execution
# HORIZONTAL SCALING: Systematically explore scaling behavior
systems = {
# Scaling with MostCommon aggregation
"width_3_voting": non.build_graph(["3:E:gpt-4o:0.7", "1:M"]),
"width_7_voting": non.build_graph(["7:E:gpt-4o:0.7", "1:M"]),
"width_11_voting": non.build_graph(["11:E:gpt-4o:0.7", "1:M"]),
# Scaling with judge synthesis
"width_3_judge": non.build_graph(["3:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]),
"width_7_judge": non.build_graph(["7:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]),
"width_11_judge": non.build_graph(["11:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]),
}
# Execute with full parallelism (XCS optimizes the execution graph automatically)
query = "What's the most effective climate change solution?"
results = {name: system(query=query) for name, system in systems.items()}
- Composable Operators with Rigorous Specification: Build reliable compound AI systems from type-safe, reusable components with validated inputs and outputs
- Automatic Parallelization: Independent operations are automatically executed concurrently across a full computational graph
- XCS Optimization Framework: "Accelerated Compound Systems" Just-in-time tracing and execution optimization with multiple strategies (trace, structural, enhanced). XCS is inspired by XLA, but intended more for accelerating compound systems vs. linear algebra operations, tuned for models and dicts, vs for vectors and numerical computation.
- Multi-Provider Support: Unified API across OpenAI, Anthropic, Claude, Gemini, and more with standardized usage tracking
- Transformation System: Function transformations for vectorization (vmap), parallelization (pmap), and device sharding (mesh), with a composable interface for building complex transformations
The Accelerated Compound Systems (XCS) module provides a computational graph-based system for building, optimizing, and executing complex operator pipelines:
-
Unified JIT System: Multiple compilation strategies under a consistent interface:
trace
: Traditional execution tracingstructural
: Structure-based analysisenhanced
: Improved parallelism detection and code analysis
-
Scheduler Framework: Pluggable scheduler implementations for different execution patterns:
sequential
: Serial execution for debugging and determinismparallel
: Thread-based parallel executionwave
: Execution wave scheduling for optimal parallelismtopological
: Dependency-based execution ordering
-
Transform System: High-level operations for data and computation transformations:
vmap
: Vectorized mapping for batch processingpmap
: Parallel mapping across multiple workersmesh
: Device mesh-based sharding for multi-device execution
-
Dependency Analysis: Automatic extraction of dependencies between operations:
- Transitive closure calculation for complete dependency mapping
- Topological sorting with cycle detection
- Execution wave computation for parallel scheduling
Ember uses uv as its recommended package manager for significantly faster installations and dependency resolution.
# First, install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS/Linux
# or
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows
# or
pip install uv # Any platform
# Quick install using uv (recommended)
uv pip install ember-ai
# Run examples directly with uv (no activation needed)
uv run python -c "import ember; print(ember.__version__)"
# Install from source for development
git clone https://github.com/pyember/ember.git
cd ember
uv pip install -e ".[dev]"
# Traditional pip installation (alternative, slower)
pip install ember-ai
For detailed installation instructions, troubleshooting, and environment management, see our Installation Guide.
Access models from any provider through a unified interface:
from ember import initialize_ember
from ember.api.models import ModelEnum
# Initialize with multiple providers
service = initialize_ember(usage_tracking=True)
# Access models from different providers with the same API
response = service(ModelEnum.gpt_4o, "What is quantum computing?")
print(response.data)
# Track usage across providers
usage = service.usage_service.get_total_usage()
print(f"Total cost: ${usage.cost:.4f}")
Build compound AI system architectures using the Network of Networks (NON) pattern with pre-built components:
from ember.api import non
# Standard API: Create a verification pipeline of ensemble→judge→verifier
pipeline = non.Sequential(operators=[
# 1. Ensemble of 5 model instances running in parallel
non.UniformEnsemble(
num_units=5,
model_name="openai:gpt-4o-mini",
temperature=0.7
),
# 2. Judge to synthesize the ensemble responses
non.JudgeSynthesis(
model_name="anthropic:claude-3-5-sonnet",
temperature=0.2
),
# 3. Verifier for quality control and fact-checking
non.Verifier(
model_name="anthropic:claude-3-5-haiku",
temperature=0.0
)
])
# Alternatively, create the same pipeline with compact notation
pipeline = non.build_graph([
"5:E:gpt-4o-mini:0.7", # Ensemble with 5 instances
"1:J:claude-3-5-sonnet:0.2", # Judge synthesis
"1:V:claude-3-5-haiku:0.0" # Verification
])
# Build advanced architectures like NestedNetwork from example_architectures.py
# Define reusable SubNetwork component
components = {
"sub": ["2:E:gpt-4o:0.0", "1:V:gpt-4o:0.0"] # Ensemble → Verifier
}
# Create a NestedNetwork with identical structure to the OOP implementation
nested = non.build_graph([
"$sub", # First SubNetwork branch
"$sub", # Second SubNetwork branch
"1:J:gpt-4o:0.0" # Judge to synthesize results
], components=components)
# Extend with custom operator types
custom_registry = non.OpRegistry.create_standard_registry()
custom_registry.register(
"CE", # Custom ensemble type
lambda count, model, temp: non.Sequential(operators=[
non.UniformEnsemble(num_units=count, model_name=model, temperature=temp),
non.MostCommon() # Auto-aggregation
])
)
# Use custom operators
advanced = non.build_graph(["3:CE:gpt-4o:0.7"], type_registry=custom_registry)
# Execute with a single call
result = pipeline(query="What causes tsunamis?")
Ember's XCS system provides JAX/XLA-inspired tracing, transformation, and automatic parallelization:
from ember.xcs import jit, execution_options, vmap, pmap, compose, explain_jit_selection
from ember.api.operators import Operator
# Basic JIT compilation with automatic strategy selection
@jit
class SimplePipeline(Operator):
# ... operator implementation ...
# JIT with explicit mode selection
@jit(mode="enhanced")
class ComplexPipeline(Operator):
def __init__(self):
self.op1 = SubOperator1()
self.op2 = SubOperator2()
self.op3 = SubOperator3()
def forward(self, *, inputs):
# These operations will be automatically parallelized
result1 = self.op1(inputs=inputs)
result2 = self.op2(inputs=inputs)
# Combine the parallel results
combined = self.op3(inputs={"r1": result1, "r2": result2})
return combined
# Configure execution parameters
with execution_options(scheduler="wave", max_workers=4):
result = pipeline(query="Complex question...")
# Get explanation for JIT strategy selection
explanation = explain_jit_selection(pipeline)
print(f"JIT strategy: {explanation['strategy']}")
print(f"Rationale: {explanation['rationale']}")
# Vectorized mapping for batch processing
batch_processor = vmap(my_operator)
batch_results = batch_processor(inputs={"data": [item1, item2, item3]})
# Parallel execution across multiple workers
parallel_processor = pmap(my_operator, num_workers=4)
parallel_results = parallel_processor(inputs=complex_data)
# Compose transformations (vectorization + parallelism)
pipeline = compose(vmap(batch_size=32), pmap(num_workers=4))(my_operator)
Ember provides a comprehensive data processing and evaluation framework with pre-built datasets and metrics:
from ember.api.data import DatasetBuilder
from ember.api.eval import EvaluationPipeline, Evaluator
# Load a dataset with the builder pattern
dataset = (DatasetBuilder()
.from_registry("mmlu") # Use a registered dataset
.subset("physics") # Select a specific subset
.split("test") # Choose the test split
.sample(100) # Random sample of 100 items
.transform( # Apply transformations
lambda x: {"query": f"Question: {x['question']}"}
)
.build())
# Create a comprehensive evaluation pipeline
eval_pipeline = EvaluationPipeline([
# Standard metrics
Evaluator.from_registry("accuracy"),
Evaluator.from_registry("response_quality"),
# Custom evaluation metrics
Evaluator.from_function(
lambda prediction, reference: {
"factual_accuracy": score_factual_content(prediction, reference)
}
)
])
# Evaluate a model or operator
results = eval_pipeline.evaluate(my_model, dataset)
print(f"Accuracy: {results['accuracy']:.2f}")
print(f"Response Quality: {results['response_quality']:.2f}")
print(f"Factual Accuracy: {results['factual_accuracy']:.2f}")
Ember is released under the MIT License.