image-describer

Analyze portrait images and generate photographer-grade natural-language prompts that loyally recreate the shot in image-generation models (Nano Banana, GPT-Image, etc.).

A sibling to music-describer: structured machine-vision analysis of an image, then optionally synthesized into photographer-grade prose suitable for use as a prompt.

Installation

pip install -e .

To enable LLM-powered descriptions, install with your preferred provider:

pip install -e ".[claude]"    # Anthropic Claude (vision)
pip install -e ".[openai]"    # OpenAI (vision)
pip install -e ".[all]"       # Both

Ollama requires no extra Python packages -- just a running Ollama server with a vision-capable model (e.g. llama3.2-vision, llava, qwen2.5vl).

Quick Start

CLI

# Show all available flags
image-describer --help

# Structured analysis only (no LLM needed)
image-describer portrait.jpg --analysis-only

# Photographer-grade prompt (requires vision-capable LLM provider)
export ANTHROPIC_API_KEY="your-key"
image-describer portrait.jpg

# Full JSON output (analysis + prompt)
image-describer portrait.jpg --json

# Save prompt to a file
image-describer portrait.jpg -o prompt.txt

# Run only a subset of analyzers (comma-separated)
image-describer portrait.jpg --analysis-only --analyzers pose,lighting,wardrobe

# Omit the identity-preservation block (you have a reference subject already)
image-describer portrait.jpg --no-identity

# Use a specific config file
image-describer portrait.jpg --config ./my-config.yaml

# OpenAI provider
export OPENAI_API_KEY="your-key"
image-describer portrait.jpg --config ./openai.yaml

# Local Ollama with a vision model
image-describer portrait.jpg --config ./ollama.yaml

By default all seven analyzers run. Pass --analyzers (CLI) or analyzers=[...] (Python API) to run a subset. Valid names: subject, pose, composition, camera, lighting, wardrobe, background.

Python API

from image_describer import analyze, describe

# Structured analysis only
result = analyze("portrait.jpg")
print(result["pose"]["framing"])           # e.g. "full-body"
print(result["lighting"]["direction"])     # e.g. "soft front"
print(result["wardrobe"]["surface_quality"])  # e.g. "matte"

# With LLM prompt synthesis (set ANTHROPIC_API_KEY or configure provider)
result = describe("portrait.jpg")
print(result["prompt"])
# Dress the woman with the neck tattoo in the exact outfit without altering her...

# Omit subject/identity block
result = describe("portrait.jpg", include_identity=False)

# Subset of analyzers
result = describe("portrait.jpg", analyzers=["pose", "lighting", "wardrobe", "background"])

Analyzers

Analyzer	Output Fields
subject	`skin_tone`, `hair_color`, `hair_length`, `face_geometry`, `distinctive_features_present`
pose	`framing` (head/half/three-quarter/full), `body_angle_deg`, `weight_distribution`, `limb_descriptors`, `gaze_direction`
composition	`aspect_ratio`, `crop`, `headroom_ratio`, `subject_placement`, `negative_space_ratio`
camera	`exif_focal_length_mm`, `exif_aperture`, `dof_estimate` (shallow/medium/deep), `angle` (eye/low/high), `sensor_look`
lighting	`color_temp_est`, `direction` (front/side/back/top), `hardness` (soft/hard), `key_fill_contrast`, `catchlight_present`
wardrobe	`dominant_colors`, `coverage_zones`, `surface_quality` (shiny/matte/sheer/...), `texture_pattern` (knit/smooth/ribbed/...), `embellishment` (sparkly/rhinestoned/sequined/plain)
background	`complexity` (simple/textured/environmental), `dominant_colors`, `edge_density`, `mood_hint` (warm/cool)

A note on wardrobe

The wardrobe analyzer deliberately does not classify garment types (dress vs blouse vs jumpsuit) or material types (silk vs cotton) from pixels — those are unreliable from a single still. Instead it emits fabric appearance tokens (shiny / matte / knit / sparkly / rhinestoned / sheer / ribbed / smooth / etc.) plus measurable color and coverage. Garment identification is delegated to the vision LLM at synthesis, which already sees the source image and can label garment nouns accurately.

Output prompt format

The LLM is prompted to produce the user's hand-tuned seven-block structure:

Identity preservation — anchored on a distinctive feature (optional, toggle via --no-identity)
Body type / proportions — one-liner
Wardrobe — ultra-realistic + fabric appearance + fit/silhouette tokens
Pose — editorial framing + concrete limb positions
Lighting — studio quality + light direction + shadow behavior
Background — simplicity statement
Aesthetic anchor — DSLR / magazine / editorial closing tokens

Configuration

Create a config.yaml in your working directory, or at ~/.image-describer/config.yaml:

llm:
  # Provider: claude, openai, or ollama
  provider: claude

  # Model name (provider-specific). Must be a vision-capable model.
  # model: claude-sonnet-4-6

  # Environment variable holding the API key
  # api_key_env: ANTHROPIC_API_KEY

  # Ollama only
  # base_url: http://localhost:11434

Config resolution order: --config flag > ./config.yaml > ~/.image-describer/config.yaml > defaults.

See config.example.yaml for the full template.

Supported Formats

JPEG, PNG, WEBP, TIFF, BMP -- any format supported by Pillow.

Development

python -m venv venv
source venv/bin/activate      # Linux/macOS
venv\Scripts\activate         # Windows (PowerShell)

pip install -e ".[dev,all]"
pytest -v

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
image_describer		image_describer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

image-describer

Installation

Quick Start

CLI

Python API

Analyzers

A note on wardrobe

Output prompt format

Configuration

Supported Formats

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

image-describer

Installation

Quick Start

CLI

Python API

Analyzers

A note on wardrobe

Output prompt format

Configuration

Supported Formats

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages