Skip to content

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

License

Notifications You must be signed in to change notification settings

1038lab/ComfyUI-MiniCPM

Repository files navigation

ComfyUI-MiniCPM

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

🎉 Now supports MiniCPM-V-4.5! The latest model with enhanced capabilities.


News & Updates

  • 2025/08/28: Update ComfyUI-MIniCPM to v1.1.1 ( update.md )
  • 2025/08/27: Update ComfyUI-MIniCPM to v1.1.0 ( update.md ) MiniCPM v4 VS v45
  • Added support for MiniCPM-V-4.5 models (Transformers)

Features

  • MiniCPM-V-4 GGUF MiniCPM-V-4-GGUF

  • MiniCPM-V-4 Batch Images MiniCPM-V-4_batchImages

  • MiniCPM-V-4 video MiniCPM-V-4_video

  • Supports MiniCPM-V-4.5 (Transformers) and MiniCPM-V-4.0 (GGUF) models

  • Latest MiniCPM-V-4.5 with enhanced capabilities via Transformers

  • Multiple caption types to suit different use cases (Describe, Caption, Analyze, etc.)

  • Memory management options to balance VRAM usage and speed

  • Auto-downloads model files on first use for easy setup

  • Customizable parameters: max tokens, temperature, top-p/k sampling, repetition penalty

  • Advanced node with full parameter control

  • Legacy node for backward compatibility

  • Comprehensive GGUF quantization options for V4.0 models


Installation

Clone the repo into your ComfyUI custom nodes folder:

cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/comfyui-minicpm.git

Install required dependencies:

cd ComfyUI/custom_nodes/comfyui-minicpm
ComfyUI\python_embeded\python pip install -r requirements.txt
ComfyUI\python_embeded\python llama_cpp_install.py

Note

llama-cpp-python CUDA Installation for ComfyUI Portable


Supported Models

Transformers Models

Model Description
MiniCPM-V-4.5 🌟 Latest V4.5 version with enhanced capabilities
MiniCPM-V-4.5-int4 🌟 V4.5 4-bit quantized version, smaller memory footprint
MiniCPM-V-4 V4.0 full precision version, higher quality
MiniCPM-V-4-int4 V4.0 4-bit quantized version, smaller memory footprint

https://huggingface.co/openbmb/MiniCPM-V-4_5
https://huggingface.co/openbmb/MiniCPM-V-4_5-int4
https://huggingface.co/openbmb/MiniCPM-V-4 https://huggingface.co/openbmb/MiniCPM-V-4-int4

GGUF Models

Note: MiniCPM-V-4.5 GGUF models are temporarily unavailable due to llama-cpp-python compatibility issues. Please use MiniCPM-V-4.5 Transformers models or MiniCPM-V-4.0 GGUF models.

MiniCPM-V-4.0 (Fully Supported)

Model Size Description
MiniCPM-V-4 (Q4_K_M) ~2.19GB Recommended balance of quality/size
MiniCPM-V-4 (Q4_0) ~2.08GB Standard 4-bit quantization
MiniCPM-V-4 (Q4_1) ~2.29GB 4-bit quantization improved
MiniCPM-V-4 (Q4_K_S) ~2.09GB 4-bit K-quants small
MiniCPM-V-4 (Q5_0) ~2.51GB 5-bit quantization
MiniCPM-V-4 (Q5_1) ~2.72GB 5-bit quantization improved
MiniCPM-V-4 (Q5_K_M) ~2.56GB 5-bit K-quants medium
MiniCPM-V-4 (Q5_K_S) ~2.51GB 5-bit K-quants small
MiniCPM-V-4 (Q6_K) ~2.96GB Very high quality
MiniCPM-V-4 (Q8_0) ~3.83GB Highest quality quantized

https://huggingface.co/openbmb/MiniCPM-V-4-gguf

The models will be automatically downloaded on first run. Manual download and placement into models/LLM (transformers) or models/LLM/GGUF (GGUF) is also supported.


Available Nodes

1. MiniCPM-4-V-Transformers

  • Basic transformers-based node with essential parameters
  • Supports image and video input
  • Memory management options
  • Preset prompt types

2. MiniCPM-4-V-Transformers Advanced

  • Full-featured transformers-based node
  • All parameters customizable
  • System prompt support
  • Advanced video processing options

3. MiniCPM-4-V-GGUF

  • GGUF-based node with essential parameters
  • Optimized for performance

4. MiniCPM-4-V-GGUF Advanced

  • Full-featured GGUF-based node
  • All parameters customizable

5. MiniCPM (Legacy)

  • Original node for backward compatibility
  • Basic functionality

Usage

  1. Add the MiniCPM node from the 🧪AILab category in ComfyUI.
  2. Connect an image or video input node to the MiniCPM node.
  3. Select the model variant (default is MiniCPM-V-4-int4 for transformers).
  4. Choose caption type and adjust parameters as needed.
  5. Execute your workflow to generate captions or analysis.

Configuration Defaults

{
  "context_window": 4096,
  "gpu_layers": -1,
  "cpu_threads": 4,
  "default_max_tokens": 1024,
  "default_temperature": 0.7,
  "default_top_p": 0.9,
  "default_top_k": 100,
  "default_repetition_penalty": 1.10,
  "default_system_prompt": "You are MiniCPM-V, a helpful, concise and knowledgeable vision-language assistant. Answer directly and stay on task."
}

Caption Types

  • Describe: Describe this image in detail.
  • Caption: Write a concise caption for this image.
  • Analyze: Analyze the main elements and scene in this image.
  • Identify: What objects and subjects do you see in this image?
  • Explain: Explain what's happening in this image.
  • List: List the main objects visible in this image.
  • Scene: Describe the scene and setting of this image.
  • Details: What are the key details in this image?
  • Summarize: Summarize the key content of this image in 1-2 sentences.
  • Emotion: Describe the emotions or mood conveyed by this image.
  • Style: Describe the artistic or visual style of this image.
  • Location: Where might this image be taken? Analyze the setting or location.
  • Question: What question could be asked based on this image?
  • Creative: Describe this image as if writing the beginning of a short story.

Memory Management Options

  • Keep in Memory: Model stays loaded for faster subsequent runs
  • Clear After Run: Model is unloaded after each run to save memory
  • Global Cache: Model is cached globally and shared between nodes

Tips

VRAM Requirements

  • 4-6GB VRAM: Use MiniCPM-V-4-int4 or GGUF Q4 models
  • 8GB VRAM: Use MiniCPM-V-4.5-int4 (recommended)
  • 12GB+ VRAM: Can use full MiniCPM-V-4.5
  • CUDA OOM Error: Try int4 quantized models or CPU mode

General Tips

  • 🌟 Try MiniCPM-V-4.5 Transformers first - enhanced capabilities over V4.0
  • For best balance: use MiniCPM-V-4 (Q4_K_M) GGUF model
  • For highest quality: use MiniCPM-V-4.5 Transformers
  • For low VRAM: use MiniCPM-V-4.5-int4 or MiniCPM-V-4 (Q4_0) GGUF
  • Adjust temperature (0.6–0.8) for balancing creativity and coherence.
  • Use top-p (0.9) and top-k (80) sampling for natural output diversity.
  • Lower max tokens or precision (bf16/fp16) for faster generation on less powerful GPUs.
  • Memory modes help optimize VRAM usage: default, balanced, max savings.
  • Transformers models offer better quality but use more memory.
  • GGUF models are more memory-efficient but may have slightly lower quality.

License

GPL-3.0 License

About

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages