Skip to content

lizergic/image-describer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image-describer

Analyze portrait images and generate photographer-grade natural-language prompts that loyally recreate the shot in image-generation models (Nano Banana, GPT-Image, etc.).

A sibling to music-describer: structured machine-vision analysis of an image, then optionally synthesized into photographer-grade prose suitable for use as a prompt.

Installation

pip install -e .

To enable LLM-powered descriptions, install with your preferred provider:

pip install -e ".[claude]"    # Anthropic Claude (vision)
pip install -e ".[openai]"    # OpenAI (vision)
pip install -e ".[all]"       # Both

Ollama requires no extra Python packages -- just a running Ollama server with a vision-capable model (e.g. llama3.2-vision, llava, qwen2.5vl).

Quick Start

CLI

# Show all available flags
image-describer --help

# Structured analysis only (no LLM needed)
image-describer portrait.jpg --analysis-only

# Photographer-grade prompt (requires vision-capable LLM provider)
export ANTHROPIC_API_KEY="your-key"
image-describer portrait.jpg

# Full JSON output (analysis + prompt)
image-describer portrait.jpg --json

# Save prompt to a file
image-describer portrait.jpg -o prompt.txt

# Run only a subset of analyzers (comma-separated)
image-describer portrait.jpg --analysis-only --analyzers pose,lighting,wardrobe

# Omit the identity-preservation block (you have a reference subject already)
image-describer portrait.jpg --no-identity

# Use a specific config file
image-describer portrait.jpg --config ./my-config.yaml

# OpenAI provider
export OPENAI_API_KEY="your-key"
image-describer portrait.jpg --config ./openai.yaml

# Local Ollama with a vision model
image-describer portrait.jpg --config ./ollama.yaml

By default all seven analyzers run. Pass --analyzers (CLI) or analyzers=[...] (Python API) to run a subset. Valid names: subject, pose, composition, camera, lighting, wardrobe, background.

Python API

from image_describer import analyze, describe

# Structured analysis only
result = analyze("portrait.jpg")
print(result["pose"]["framing"])           # e.g. "full-body"
print(result["lighting"]["direction"])     # e.g. "soft front"
print(result["wardrobe"]["surface_quality"])  # e.g. "matte"

# With LLM prompt synthesis (set ANTHROPIC_API_KEY or configure provider)
result = describe("portrait.jpg")
print(result["prompt"])
# Dress the woman with the neck tattoo in the exact outfit without altering her...

# Omit subject/identity block
result = describe("portrait.jpg", include_identity=False)

# Subset of analyzers
result = describe("portrait.jpg", analyzers=["pose", "lighting", "wardrobe", "background"])

Analyzers

Analyzer Output Fields
subject skin_tone, hair_color, hair_length, face_geometry, distinctive_features_present
pose framing (head/half/three-quarter/full), body_angle_deg, weight_distribution, limb_descriptors, gaze_direction
composition aspect_ratio, crop, headroom_ratio, subject_placement, negative_space_ratio
camera exif_focal_length_mm, exif_aperture, dof_estimate (shallow/medium/deep), angle (eye/low/high), sensor_look
lighting color_temp_est, direction (front/side/back/top), hardness (soft/hard), key_fill_contrast, catchlight_present
wardrobe dominant_colors, coverage_zones, surface_quality (shiny/matte/sheer/...), texture_pattern (knit/smooth/ribbed/...), embellishment (sparkly/rhinestoned/sequined/plain)
background complexity (simple/textured/environmental), dominant_colors, edge_density, mood_hint (warm/cool)

A note on wardrobe

The wardrobe analyzer deliberately does not classify garment types (dress vs blouse vs jumpsuit) or material types (silk vs cotton) from pixels — those are unreliable from a single still. Instead it emits fabric appearance tokens (shiny / matte / knit / sparkly / rhinestoned / sheer / ribbed / smooth / etc.) plus measurable color and coverage. Garment identification is delegated to the vision LLM at synthesis, which already sees the source image and can label garment nouns accurately.

Output prompt format

The LLM is prompted to produce the user's hand-tuned seven-block structure:

  1. Identity preservation — anchored on a distinctive feature (optional, toggle via --no-identity)
  2. Body type / proportions — one-liner
  3. Wardrobe — ultra-realistic + fabric appearance + fit/silhouette tokens
  4. Pose — editorial framing + concrete limb positions
  5. Lighting — studio quality + light direction + shadow behavior
  6. Background — simplicity statement
  7. Aesthetic anchor — DSLR / magazine / editorial closing tokens

Configuration

Create a config.yaml in your working directory, or at ~/.image-describer/config.yaml:

llm:
  # Provider: claude, openai, or ollama
  provider: claude

  # Model name (provider-specific). Must be a vision-capable model.
  # model: claude-sonnet-4-6

  # Environment variable holding the API key
  # api_key_env: ANTHROPIC_API_KEY

  # Ollama only
  # base_url: http://localhost:11434

Config resolution order: --config flag > ./config.yaml > ~/.image-describer/config.yaml > defaults.

See config.example.yaml for the full template.

Supported Formats

JPEG, PNG, WEBP, TIFF, BMP -- any format supported by Pillow.

Development

python -m venv venv
source venv/bin/activate      # Linux/macOS
venv\Scripts\activate         # Windows (PowerShell)

pip install -e ".[dev,all]"
pytest -v

License

MIT

About

Analyze portrait images and generate photographer-grade natural-language prompts that loyally recreate the shot in image-generation models (Nano Banana, GPT-Image, etc.).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages