Generate per-session LoRA adapters for inference tasks using hypernetwork synthesis.
Version: 1.3.9
- Metadata-to-LoRA: Generate adapters from structured user metadata (JSON)
- Text-to-LoRA: Generate adapters from natural language descriptions
- Doc-to-LoRA with SHINE: Generate adapters from document content using SHINE (ICML 2026) for long-context internalization
- Base Model Management: Download, cache, and serve base models with vLLM integration
- FastAPI: Modern async Python web framework
- OpenAI-compatible API: Easy integration with existing tooling
pip install tessera-hypernetworkFrom metadata (JSON string or file):
tessera generate \
--from-metadata '{"task": "classification", "domain": "medical"}' \
--base-model mistralai/Mistral-7B-Instruct-v0.2 \
--rank 16 \
--save ./adapter.safetensorsFrom text description:
tessera generate \
--from-text "Medical diagnosis assistant" \
--base-model mistralai/Mistral-7B-Instruct-v0.2 \
--rank 16 \
--save ./adapter.safetensorsFrom document:
tessera generate \
--from-doc ./document.txt \
--base-model mistralai/Mistral-7B-Instruct-v0.2 \
--rank 16 \
--save ./adapter.safetensorsDownload a base model from HuggingFace Hub:
tessera model pull mistralai/Mistral-7B-Instruct-v0.2
tessera model pull meta-llama/Llama-3.1-8B-Instruct
tessera model pull deepseek-ai/DeepSeek-R1-Distill-Qwen-7BStart vLLM with a base model:
tessera model serve-model mistralai/Mistral-7B-Instruct-v0.2 --port 8000
tessera model serve-model mistralai/Mistral-7B-Instruct-v0.2 --gpu-memory-utilization 0.9
tessera model serve-model mistralai/Mistral-7B-Instruct-v0.2 --quantization awqList cached base models:
tessera model list-modelsRemove a cached model:
tessera model remove mistralai/Mistral-7B-Instruct-v0.2Start the hypernetwork server (with auto vLLM):
tessera serve --port 8080 --base-model mistralai/Mistral-7B-Instruct-v0.2Start the hypernetwork server (standalone):
tessera serve --port 8080 --host 0.0.0.0tessera health --url http://localhost:8080tessera listGenerate LoRA adapters from metadata, text, or documents.
Options:
--from-metadata: JSON metadata string or file path--from-text: Natural language description--from-doc: Document content or file path--base-model: Base model identifier (default: mistralai/Mistral-7B-Instruct-v0.2)--rank: LoRA rank (default: 16)--save: Output path for safetensors file (required)--mode: Generation mode: doc, metadata, or text (auto-inferred if not specified)
Manage base models for vLLM serving.
tessera model pull <model_id>
Download a base model from HuggingFace Hub and cache locally.
tessera model serve-model <model_id>
Start vLLM with a specified base model.
Options:
--port: Port to serve on (default: 8000)--gpu-memory-utilization: GPU memory utilization fraction (e.g., 0.9)--tensor-parallel-size: Tensor parallel size (default: 1)--quantization: Quantization method (e.g., awq, gptq, bitsandbytes)--max-model-len: Maximum model length (default: 8192)
tessera model list-models List all locally cached base models.
tessera model remove <model_id>
Remove a cached base model to free disk space.
Start the Tessera hypernetwork server.
Options:
--port: Port to serve on (default: 8080)--host: Host to bind to (default: 0.0.0.0)--qdrant-url: Qdrant vector database URL (optional)--workers: Number of worker processes (default: 1)--base-model: Base model to auto-start vLLM with (e.g., mistralai/Mistral-7B-Instruct-v0.2)--vllm-port: Port for vLLM server (default: 8000)
Check server health status.
Options:
--url: Server URL (default: http://localhost:8080)
List available base models and their dimensions, plus cached models.
Import, list, and unload adapters:
Import an adapter:
tessera lorax import-adapter \
--path ./adapter.safetensors \
--name my-adapter \
--base-model mistralai/Mistral-7B-Instruct-v0.2 \
--server-url http://localhost:8080List loaded adapters:
tessera lorax list-adapters --server-url http://localhost:8080Unload an adapter:
tessera lorax unload --name my-adapter --server-url http://localhost:8080The hypernetwork service provides a FastAPI server with the following endpoints:
POST /v1/generate- Generate a LoRA adapter for a given promptGET /health- Health check endpointPOST /v1/adapters- Import adapter safetensorsGET /v1/adapters- List loaded adaptersDELETE /v1/adapters/{name}- Unload adapter
Apache-2.0