Skip to content

feat: auto-detect image aspect ratio from prompt#125

Open
TriTue2011 wants to merge 384 commits into
basketikun:mainfrom
TriTue2011:main
Open

feat: auto-detect image aspect ratio from prompt#125
TriTue2011 wants to merge 384 commits into
basketikun:mainfrom
TriTue2011:main

Conversation

@TriTue2011
Copy link
Copy Markdown

Cho phép tự động nhận diện tỷ lệ khung hình (ví dụ 16:9, 1:1, 4:3) từ nội dung của prompt để ghi đè lên cấu hình mặc định (khi kích thước bị Home Assistant hoặc các client khác gửi lên mặc định). Điều này cho phép ghi đè tham số \size\ bằng những keyword tìm thấy trong yêu cầu gốc của người dùng.

@TriTue2011
Copy link
Copy Markdown
Author

I have pushed two additional commits to this PR:

  1. Fix: Changed _build_tool_prompt\ to check for \properties\ instead of
    equired. This fixes a critical bug where tools with only optional arguments (like Home Assistant's \HassTurnOn) were forced to be called with {}\ because they lacked required parameters. Now the model can see and use optional arguments correctly.
  2. Update: Translated the hardcoded image size hints from Chinese to Vietnamese to better support Vietnamese text-to-speech and users. (Note: Feel free to ask if you'd prefer this part to remain in Chinese or be changed to English for internationalization!)

TriTue2011 and others added 28 commits May 12, 2026 11:08
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…"text",...}])

Critical bug: HA messages use list content format, but inject_search_results
only handled string content. Search results were never actually injected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- HA addon config.yaml, Dockerfile, run.sh
- Full Vietnamese README in homeassistant-addon/
- Updated root README.md with quick start

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix: auto model now respects user config + list models uses intersection across all tokens

- gemini_free/auto now reads model from config.data.providers.gemini_free.model
- backend_router resolves auto by checking user config before hardcoded defaults
- /v1/models fetches models from all active ChatGPT tokens in parallel
- with multi-account, only shows models common to all tokens (intersection)
- falls back to anon if no tokens available

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@
TriTue2011 and others added 30 commits May 17, 2026 00:09
Already present in conversation.py. Fallback to len/4 if import fails.
Completion tokens use stream content length / 4 estimate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Both Add New and Edit mode pickers now 420px wide
- Larger row height (py-2.5) and bigger font (13px)
- Better visual separation with borders between rows
- z-50 elevation for proper layering

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Popup width: 640px, max-height: 70vh
- Search input to filter models by name
- Backdrop overlay (click to close)
- ModelSearch state for text filtering

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fixed centered modal 720px, max-height 80vh
- Click outside does NOT close (no backdrop handler)
- Close button (X) + Done button to dismiss
- Selected models show green checkmark + highlight
- Button shows count of selected models
- Edit mode also uses same modal style

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Unused providers shown in gray (#9ca3af) with no connection line
- Only providers with actual usage get colored nodes + animated edges
- Provider status determined by usage log, not config
- Falls back to showing all (without edges) if no usage data yet

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Changed prefix matching from exact "geminiapi/" to startsWith("geminiapi")
Now geminiapi1, geminiapi2, geminiapi3, geminiapi4 all get:
- image (image generation)
- vision (image analysis)
- video (video analysis)
capability labels in model list and combo picker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-configured for ports 8000-8003 with geminiapi1-4 prefixes.
Appears at top of provider presets for easy access.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_is_model_enabled now checks: if the model's provider has no explicit
filtering configured, treat all its models as enabled. Fixes issue
where enabling model filter for chatgpt would hide all custom provider models.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hars

- Now detects relative paths like /media/img_xxx.png?token=yyy
- Prepends base_url for relative image URLs
- Fixed regex to exclude ) ] from URL capture
- Converts relative URLs to absolute before downloading

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
stream_image_outputs_with_pool now checks if model is a combo
and resolves to the first route with an IMAGE_MODELS entry.
Fixes "unsupported image model" error for image combos.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Gemini API Server returns images as markdown links in chat text.
Adapter now extracts URL from ![alt](url) syntax + downloads to base64.
Verified: image access works from within Docker container (200 OK).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- build_body now detects image data from edit endpoint
- Edit mode: sends original image as base64 + editing instruction
- Generate mode: existing behavior unchanged
- Adapted to OpenAI vision format (image_url type)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ImageComposer now accepts model + imageModels props.
Parent page.tsx needs wiring to fetch image models and pass down.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Added imageModel state (default gpt-image-2)
- Fetch image-capable models from /api/v1/models-with-capabilities
- Pass model + imageModels + onModelChange to ImageComposer
- Use selected model instead of hardcoded "gpt-image-2"

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Gemini returns content as array when prompt uses multi-part format.
parse_response now converts array content to string before parsing.
Also handles image_url type in content array.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
format_image_result now strips data:image/...;base64, prefix
before calling base64.b64decode to save image bytes.
Fixes Gemini custom provider image generation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Removed "Hạn mức còn lại" display (frees space)
- Removed "Tải lên" button (frees space)
- Textarea: 16px font-semibold, min-h 120px, no absolute overlay
- Controls bar: flat layout below textarea (not floating on top)
- Removed sm:absolute positioning that caused text compression

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Button opens library image picker for selecting reference images
from previously generated images. Parent page.tsx wiring pending.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Button "Thư viện" opens modal with all generated images
- Grid view, click to select and add as reference image
- Downloads image, converts to File, adds via handleReferenceImageChange
- Modal closes on backdrop click or after selection

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…int)

build_body now accepts 'images' list from _handle_adapter_edit,
encodes raw bytes to base64 for Gemini chat API.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant