[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
-
Updated
Aug 12, 2024 - Python
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
✨✨Latest Advances on Multimodal Large Language Models
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Algorithms and Publications on 3D Object Tracking
Parsing-free RAG supported by VLMs
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
[CVPR 2025] MINIMA: Modality Invariant Image Matching
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
[CVPR 2023] Collaborative Diffusion
An open-source implementation for training LLaVA-NeXT.
[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
Official repository for VisionZip (CVPR 2025)
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
An official PyTorch implementation of the CRIS paper
Add a description, image, and links to the multi-modality topic page so that developers can more easily learn about it.
To associate your repository with the multi-modality topic, visit your repo's landing page and select "manage topics."