Analyze portrait images and generate photographer-grade natural-language prompts that loyally recreate the shot in image-generation models (Nano Banana, GPT-Image, etc.).
A sibling to music-describer: structured machine-vision analysis of an image, then optionally synthesized into photographer-grade prose suitable for use as a prompt.
pip install -e .To enable LLM-powered descriptions, install with your preferred provider:
pip install -e ".[claude]" # Anthropic Claude (vision)
pip install -e ".[openai]" # OpenAI (vision)
pip install -e ".[all]" # BothOllama requires no extra Python packages -- just a running Ollama server with a vision-capable model (e.g. llama3.2-vision, llava, qwen2.5vl).
# Show all available flags
image-describer --help
# Structured analysis only (no LLM needed)
image-describer portrait.jpg --analysis-only
# Photographer-grade prompt (requires vision-capable LLM provider)
export ANTHROPIC_API_KEY="your-key"
image-describer portrait.jpg
# Full JSON output (analysis + prompt)
image-describer portrait.jpg --json
# Save prompt to a file
image-describer portrait.jpg -o prompt.txt
# Run only a subset of analyzers (comma-separated)
image-describer portrait.jpg --analysis-only --analyzers pose,lighting,wardrobe
# Omit the identity-preservation block (you have a reference subject already)
image-describer portrait.jpg --no-identity
# Use a specific config file
image-describer portrait.jpg --config ./my-config.yaml
# OpenAI provider
export OPENAI_API_KEY="your-key"
image-describer portrait.jpg --config ./openai.yaml
# Local Ollama with a vision model
image-describer portrait.jpg --config ./ollama.yamlBy default all seven analyzers run. Pass
--analyzers(CLI) oranalyzers=[...](Python API) to run a subset. Valid names:subject,pose,composition,camera,lighting,wardrobe,background.
from image_describer import analyze, describe
# Structured analysis only
result = analyze("portrait.jpg")
print(result["pose"]["framing"]) # e.g. "full-body"
print(result["lighting"]["direction"]) # e.g. "soft front"
print(result["wardrobe"]["surface_quality"]) # e.g. "matte"
# With LLM prompt synthesis (set ANTHROPIC_API_KEY or configure provider)
result = describe("portrait.jpg")
print(result["prompt"])
# Dress the woman with the neck tattoo in the exact outfit without altering her...
# Omit subject/identity block
result = describe("portrait.jpg", include_identity=False)
# Subset of analyzers
result = describe("portrait.jpg", analyzers=["pose", "lighting", "wardrobe", "background"])| Analyzer | Output Fields |
|---|---|
| subject | skin_tone, hair_color, hair_length, face_geometry, distinctive_features_present |
| pose | framing (head/half/three-quarter/full), body_angle_deg, weight_distribution, limb_descriptors, gaze_direction |
| composition | aspect_ratio, crop, headroom_ratio, subject_placement, negative_space_ratio |
| camera | exif_focal_length_mm, exif_aperture, dof_estimate (shallow/medium/deep), angle (eye/low/high), sensor_look |
| lighting | color_temp_est, direction (front/side/back/top), hardness (soft/hard), key_fill_contrast, catchlight_present |
| wardrobe | dominant_colors, coverage_zones, surface_quality (shiny/matte/sheer/...), texture_pattern (knit/smooth/ribbed/...), embellishment (sparkly/rhinestoned/sequined/plain) |
| background | complexity (simple/textured/environmental), dominant_colors, edge_density, mood_hint (warm/cool) |
The wardrobe analyzer deliberately does not classify garment types (dress vs blouse vs jumpsuit) or material types (silk vs cotton) from pixels — those are unreliable from a single still. Instead it emits fabric appearance tokens (shiny / matte / knit / sparkly / rhinestoned / sheer / ribbed / smooth / etc.) plus measurable color and coverage. Garment identification is delegated to the vision LLM at synthesis, which already sees the source image and can label garment nouns accurately.
The LLM is prompted to produce the user's hand-tuned seven-block structure:
- Identity preservation — anchored on a distinctive feature (optional, toggle via
--no-identity) - Body type / proportions — one-liner
- Wardrobe — ultra-realistic + fabric appearance + fit/silhouette tokens
- Pose — editorial framing + concrete limb positions
- Lighting — studio quality + light direction + shadow behavior
- Background — simplicity statement
- Aesthetic anchor — DSLR / magazine / editorial closing tokens
Create a config.yaml in your working directory, or at ~/.image-describer/config.yaml:
llm:
# Provider: claude, openai, or ollama
provider: claude
# Model name (provider-specific). Must be a vision-capable model.
# model: claude-sonnet-4-6
# Environment variable holding the API key
# api_key_env: ANTHROPIC_API_KEY
# Ollama only
# base_url: http://localhost:11434Config resolution order: --config flag > ./config.yaml > ~/.image-describer/config.yaml > defaults.
See config.example.yaml for the full template.
JPEG, PNG, WEBP, TIFF, BMP -- any format supported by Pillow.
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows (PowerShell)
pip install -e ".[dev,all]"
pytest -vMIT