image-pool is a small FastAPI service for local image generation and image
editing backends. It exposes an OpenAI-style image API, keeps model lifecycle
state in one process, and lets a UI or another service load, unload, inspect,
call local image models, and start local LoRA training runs.
The service is intentionally narrow: it owns local image model routing and runtime scheduling. It does not own UI workflows, persistent artifact storage, or long-term job persistence.
- What It Does
- Repository Role
- Related Repositories
- Code Map
- API Surface
- Runtime Model
- Configuration
- Model Directories
- Backends
- Development
- Tests
- Deployment Notes
- License
- Provides text-to-image requests through
POST /v1/images/generations. - Provides image-edit requests through
POST /v1/images/edits. - Reports currently loaded public models through
GET /v1/models. - Reports all configured models and runtime state through
GET /v1/admin/models. - Loads and unloads configured models with admin endpoints.
- Reports coarse GPU memory information from
nvidia-smi. - Runs one in-process scheduler per loaded model with a configurable
target_inflight. - Starts and monitors local LoRA training runs for FLUX.2-klein and Z-Image models.
This repo owns:
- The image-pool HTTP API.
- Model config loading and
config/local.jsonoverrides. - In-process model lifecycle and scheduling.
- Local Diffusers-based image generation/editing runtimes.
- Local LoRA training workers for supported image backends.
- A stub runtime for API and lifecycle tests.
This repo deliberately does not own:
- Browser UI. The current UI lives in
llm-workbench. - Persistent image artifact storage.
- A queue that survives process restart.
llm-workbench: browser UI and proxy endpoints for image-pool.llm-pool,tts-pool,asr-pool: sibling local pool services with similar lifecycle ideas, but different model domains.
app/main.py
FastAPI app, routes, lifespan, and error mapping.
app/config.py
Pydantic settings models and config/local.json merge logic.
app/schemas.py
Request and response schemas for image generation, image editing, and LoRA
training.
app/engine/router.py
Model registry, load/unload logic, public/admin payloads, GPU memory payloads.
app/engine/scheduler.py
Per-model in-process queue and target_inflight worker control.
app/engine/stub.py
Test backend that returns generated PNG payloads.
app/engine/diffusers_flux.py
FLUX.2-klein Diffusers runtime for text-to-image and image edit.
app/engine/flux_fp8.py
Helpers for loading FLUX.2-klein FP8 safetensor variants with a base pipeline.
app/engine/diffusers_sdxl.py
SDXL Diffusers runtime for text-to-image and img2img-style editing.
app/engine/diffusers_z_image.py
Z-Image Diffusers runtime for text-to-image, img2img, and LoRA adapter use.
app/engine/diffusers_firered_gguf.py
FireRed/Qwen image-edit runtime using a GGUF transformer and Diffusers.
app/training.py
In-process FLUX.2-klein and Z-Image LoRA training workers and status state.
config/settings.json
Base model and service configuration.
docs/runtime-admin-api.md
Detailed runtime admin API notes, including current payloads and proposed
parameter schema extensions.
tests/
Lightweight API and configuration tests.
For the full runtime/admin contract, see
docs/runtime-admin-api.md.
GET /healthzReturns:
{"status": "ok"}GET /v1/modelsReturns only loaded models. Each model includes its backend and capabilities.
GET /v1/admin/modelsReturns all configured models, including unloaded models, scheduler state, configured model paths, capabilities, load errors, and VRAM estimates.
GET /v1/admin/gpu-memoryReturns GPU memory data from nvidia-smi plus configured model estimates.
POST /v1/admin/models/{model_name}/loadLoads a configured model into the process and registers its scheduler.
POST /v1/admin/models/{model_name}/unloadUnregisters the model scheduler and releases the runtime object. CUDA allocators may keep reserved memory until process restart depending on the backend.
curl -s http://127.0.0.1:8013/v1/images/generations \
-H 'content-type: application/json' \
-d '{
"model": "flux2-klein-4b",
"prompt": "paint the Eiffel Tower by night",
"size": "512x512",
"n": 1,
"metadata": {
"steps": 4,
"guidance": 1.0
}
}'The response contains base64 PNG data in data[].b64_json.
curl -s http://127.0.0.1:8013/v1/images/edits \
-H 'content-type: application/json' \
-d '{
"model": "flux2-klein-4b",
"prompt": "remove all text from the package label",
"size": "512x512",
"n": 1,
"metadata": {
"steps": 4,
"guidance": 1.0
},
"images": [
{
"name": "input.png",
"data_url": "data:image/png;base64,..."
}
]
}'images[].data_url must be an image data URL with a base64 payload.
GET /v1/training/flux-lora
POST /v1/training/flux-lora
POST /v1/training/flux-lora/stopGET /v1/training/z-image-lora
POST /v1/training/z-image-lora
POST /v1/training/z-image-lora/stopTraining requests point at an existing dataset directory and output directory.
The dataset directory must contain image files with matching .txt captions.
The service keeps one in-process training state, so only one training run is
active at a time.
image-pool is a single-process service. At startup it reads settings, creates
model state entries, and loads models where enabled is true.
Each loaded model gets a LoadedModelExecutor in the scheduler. Requests for a
model enter that model's queue and are processed up to target_inflight at a
time. Current real image backends are configured with target_inflight: 1,
which avoids concurrent GPU work inside one model runtime.
Model load and unload are runtime actions. They do not rewrite config files.
The enabled field controls startup behavior only.
Training runs execute inside the service process in a worker thread and write their outputs to the requested output directory. Training status is runtime state; it is not a durable job queue and does not survive process restart.
Base settings live in:
config/settings.json
Machine-local overrides can be placed in:
config/local.json
config/local.json is ignored by git. It is the right place to change local
model paths, enable/disable models on one machine, or tune VRAM estimates.
Important model fields:
| Field | Meaning |
|---|---|
backend |
Runtime backend key, such as stub, diffusers_flux2_klein, or diffusers_firered_gguf. |
enabled |
Load this model automatically at service startup. |
target_inflight |
Maximum concurrent requests for the loaded model executor. |
model_path |
Local model directory or file used by the backend. |
base_model_path |
Optional local base pipeline directory for backends that need a separate base model. |
transformer_config_path |
Optional transformer config path for backends that load a separate transformer artifact. |
modalities |
Input modalities, for example ["text", "image"]. |
output_modalities |
Output modalities, currently ["image"]. |
tasks |
Supported tasks, such as image_generation and image_edit. |
max_images |
Maximum input images accepted by image-edit requests. |
max_output_images |
Maximum output images per request. |
vram_estimate_mib |
Configured VRAM estimate shown by admin/UI surfaces. |
recommended_steps |
Model-specific default step count for UI/runtime callers. |
recommended_guidance |
Model-specific default guidance value for UI/runtime callers. |
Prefer readable local model directories and files over Hugging Face cache
blobs/refs/snapshots paths. A local Diffusers model directory should contain
files such as model_index.json, transformer/, vae/, text_encoder/, and
tokenizer or processor directories.
Example local layout:
/path/to/models/
FLUX.2-klein-4B/
model_index.json
transformer/
vae/
text_encoder/
tokenizer/
FireRed-Image-Edit-1.1/
model_index.json
transformer/config.json
text_encoder/
vae/
processor/
tokenizer/
FireRed-Image-Edit-1.1-Q4_K_M.gguf
stable-diffusion-xl-base-1.0/
model_index.json
unet/
vae/
text_encoder/
text_encoder_2/
tokenizer/
tokenizer_2/
Z-Image-Turbo/
model_index.json
transformer/
vae/
text_encoder/
tokenizer/
Example download commands:
huggingface-cli download black-forest-labs/FLUX.2-klein-4B \
--local-dir /path/to/models/FLUX.2-klein-4Bhuggingface-cli download FireRedTeam/FireRed-Image-Edit-1.1 \
--local-dir /path/to/models/FireRed-Image-Edit-1.1huggingface-cli download vantagewithai/FireRed-Image-Edit-1.1-GGUF \
FireRed-Image-Edit-1.1-Q4_K_M.gguf \
--local-dir /path/to/modelsThen point model_path and base_model_path at those local paths.
The stub backend is enabled by default. It validates request shape and returns small PNG payloads. It is used for API and scheduler tests and does not require CUDA.
The FLUX.2-klein backend uses Flux2KleinPipeline from Diffusers.
Capabilities:
- Text-to-image.
- Image edit.
- Up to 4 input images in config.
- One output image per request in config.
Defaults:
steps:4guidance:1.0torch_dtype:bfloat16- Device: CUDA
The current runtime loads the full pipeline onto GPU.
FP8 FLUX.2-klein safetensor variants can be configured with model_path
pointing at the safetensor file and base_model_path pointing at the matching
Diffusers base pipeline directory.
The SDXL backend uses StableDiffusionXLPipeline for text-to-image and
StableDiffusionXLImg2ImgPipeline when an input image is provided.
Capabilities:
- Text-to-image.
- Img2img-style image edit.
- One input image in config.
- One output image per request in config.
Defaults:
steps:recommended_stepsor30guidance:recommended_guidanceor5.0strength:0.35for image edit requests- Device: CUDA
The Z-Image backend uses the configured Z-Image Diffusers pipeline for text-to-image and image-to-image requests.
Capabilities:
- Text-to-image.
- Img2img-style image edit when the pipeline supports it.
- LoRA adapter loading through request metadata.
- One input image in config.
- One output image per request in config.
Defaults:
steps:recommended_stepsor9guidance:recommended_guidanceor0.0strength:0.35for image edit requests- Device: CUDA
The FireRed backend uses a Qwen-image edit pipeline with a GGUF transformer file and a separate local base pipeline directory.
Capabilities:
- Image edit only.
- One input image in config.
- One output image per request in config.
Defaults:
steps:40guidance:4.0, mapped totrue_cfg_scalenegative_prompt: a single spacetorch_dtype:bfloat16- Device: CUDA
The runtime enables VAE tiling and slicing. The current version is technically working, but should be treated as experimental: it has shown weaker edit quality than the FLUX.2-klein backend for product/label editing tests.
Create an environment:
python -m venv .venv
. .venv/bin/activate
pip install -e '.[dev]'For real Diffusers backends, install the optional runtime dependencies:
pip install -e '.[flux]'The optional extra is currently named flux, but it contains the shared
Diffusers, Torch, PEFT, GGUF, and image runtime dependencies used by the real
backends.
Run the service:
uvicorn app.main:app --host 127.0.0.1 --port 8013Or:
python -m uvicorn app.main:app --host 127.0.0.1 --port 8013Run the lightweight test suite:
python -m pytestCompile-check application and tests:
python -m compileall -q app testsThe unit tests do not load real Diffusers models. Real backend verification is manual and should use the admin load endpoint plus a small generation or edit request.
There are no deployment scripts in this repo yet. Run it as a normal ASGI service behind the local tooling that owns process supervision.
Operational notes:
- Keep
target_inflightat1for large local image models unless the backend has been measured under concurrency. - Prefer local model directories over Hugging Face cache blob paths.
- Restarting the process is the most reliable way to release all CUDA allocator state after heavy model experiments.
No license file is currently present in this repository.