Awesome-Multimodal-Reasoning

This is a repository for organizing papres related to Multimodal Reasoning in Multimodal Large Language Models (Image, Video).

With the development of the visual (audio) capabilities and reasoning capabilities (RL powered) of multimodal large language models(MLLMs/LVLMs/LSLMs), researchers have high hopes for the multimodal reasoning capabilities of MLLM/LVLM/LSLM.

This repo also select paper about visual generation (image generation/video generation) with RL/CoT.

⭐ If you find this list useful, welcome to star it!

Paper List (Updating...)

Survey

(8 May 2025) Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

(30 Apr 2025) Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

(4 Apr 2025) Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning

(18 Mar 2025) Aligning Multimodal LLM with Human Preference: A Survey

(16 Mar 2025) Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Image Reasoning

(29 Oct 2025) PairUni: Pairwise Training for Unified Multimodal Language Models

(27 Oct 2025) VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation

(23 Oct 2025) Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning

(23 Oct 202) Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

(18 Oct 2025) RL makes MLLMs see better than SFT

(16 Oct 2025) MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

(15 Oct 2025) Generative Universal Verifier as Multimodal Meta-Reasoner

(14 Oct 2025) HoneyBee: Data Recipes for Vision-Language Reasoners

(14 Oct 2025) DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

(10 Oct 2025) Unleashing Perception-Time Scaling to Multimodal Reasoning Models

(10 Oct 2025) Spotlight on Token Perception for Multimodal Reinforcement Learning

(10 Oct 2025) Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging

(13 Oct 2025) CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

(9 Oct 2025) ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

(9 Oct 2025) SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models

(7 Oct 2025) Context Matters: Learning Global Semantics via Object-Centric Representation

(6 Oct 2025) Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment

(3 Oct 2025) Efficient Test-Time Scaling for Small Vision-Language Models

(27 Sep 2025) Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning

(29 Sep 2025) Latent Visual Reasoning

(29 Sep 2025) GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning

(28 Sep 2025) Poivre: Self-Refining Visual Pointing with Reinforcement Learning

(29 Sep 2025) VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding

(29 Sep 2025) Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks

(25 Sep 2025) MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

(12 Sep 2025) LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA

(9 Sep 2025) Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

(28 Aug 2025) R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

(27 Aug 2025) Self-Rewarding Vision-Language Model via Reasoning Decomposition

(18 Aug 2025) M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following

(18 Aug 2025) Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation

(18 Aug 2025) Ovis2.5 Technical Report

(18 Aug 2025) MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models

(8 Aug 2025) SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

(7 Aug 2025) Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

(7 Aug 2025) StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models

(5 Aug 2025) Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions

(30 Jul 2025) MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

(28 Jul 2025) Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

(24 Jul 2025) MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning

(24 Jul 2025) SafeWork-R1: Coevolving Safety and Intelligence under the AI-45 Law

(22 Jul 2025) C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

(22 Jul 2025) Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

(11 Jul 2025) M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

(3 Jul 2025) Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

(1 Jul 2025) GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

(20 Jun 2025) GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

(16 Jun 2025) Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

(11 Jun 2025) ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

(5 Jun 2025) Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

(5 Jun 2025) Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

(5 Jun 2025) MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

(16 May 2025) Visual Planning: Let's Think Only with Images

(15 May 2025) MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

(13 May 2025) OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

(12 May 2025) Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning

(8 May 2025) Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

( 8 May 2025) SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models

(6 May 2025) X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

(6 May 2025) Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

(6 May 2025) ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

(5 May 2025) R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

(28 Apr 2025) SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

(25 Apr 2025) Fast-Slow Thinking for Large Vision-Language Model Reasoning

(25 Apr 2025) Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

(25 Apr 2025) Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

(21 Apr 2025) A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

(20 Apr 2025) Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension

(12 Apr 2025) VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

(10 Apr 2025) VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

(10 Apr 2025) SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

(10 Apr 2025) Perception-R1: Pioneering Perception Policy with Reinforcement Learning

(10 Apr 2025) Kimi-VL Technical Report

(8 Apr 2025) On the Suitability of Reinforcement Fine-Tuning to Visual Tasks

(8 Apr 2025) Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

(1 Apr 2025) Improved Visual-Spatial Reasoning via R1-Zero-Like Training

(17 Mar 2025) R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

(13 Mar 2025) VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

(9 Mar 2025) Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

(7 Mar 2025) R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning

(7 Mar 2025) Unified Reward Model for Multimodal Understanding and Generation

(7 Mar 2025) R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

(3 Mar 2025) Visual-RFT: Visual Reinforcement Fine-Tuning

(4 Feb 2025) Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

(3 Jan 2025) Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

(13 Jan 2025) Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

(10 Jan 2025) LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

(9 Jan 2025) Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

(30 Dec 2024) Slow Perception: Let's Perceive Geometric Figures Step-by-step

(19 Dec 2024) Progressive Multimodal Reasoning via Active Retrieval

(29 Nov 2024) Interleaved-Modal Chain-of-Thought

(15 Nov 2024) Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination

(15 Nov 2024) LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

(30 Oct 2024) Vision-Language Models Can Self-Improve Reasoning via Reflection

(23 Oct 2024) R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

(21 Oct 2024) Improve Vision Language Model Chain-of-thought Reasoning

(11 Oct 2024) M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought

(6 Oct 2024) MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration

(4 Oct 2024) Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

(29 Sep 2024) CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

(13 Jun 2024) Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

(28 Dec 2023) Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos

(14 Dec 2023) Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

(27 Nov 2023) Compositional Chain-of-Thought Prompting for Large Multimodal Models

(15 Nov 2023) The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

(3 May 2023) Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

(16 Apr 2023) Chain of Thought Prompt Tuning in Vision Language Models

(2 Feb 2023) Multimodal Chain-of-Thought Reasoning in Language Models

Video

(23 Oct 2025) Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

(9 Oct 2025) SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

(6 Oct 202) Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

(5 Oct 2025) Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning

(29 Sep 2025) FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting

(29 Sep 2025) LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

(28 Sep 2025) FrameMind: Frame-Interleaved Chain-of-Thought for Video Reasoning via Reinforcement Learning

(12 Jun 2025) CogStream: Context-guided Streaming Video Question Answering

(6 Jun 2025) VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

(27 Mar 2025) Video-R1: Reinforcing Video Reasoning in MLLMs

(17 Feb 2025) video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

(10 Feb 2025) CoS: Chain-of-Shot Prompting for Long Video Understanding

(8 Jan 2025) Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

(3 Dec 2024) VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation

(2 Dec 2024) Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

(29 Nov 2024) STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

(21 Oct 2024) Improve Vision Language Model Chain-of-thought Reasoning

(12 Oct 2024) Interpretable Video based Stress Detection with Self-Refine Chain-of-thought Reasoning

(27 Sep 2024) Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks

(28 Aug 2024) Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

(24 May 2024) Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models

(7 May 2024) Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. code

(8 Oct 2024) Temporal Reasoning Transfer from Text to Video.

DLLM

(9 Oct 2025) Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

(9 Oct 2025) Enhancing Reasoning for Diffusion LLMs via Distribution Matching Policy Optimization

Audio

(23 Oct 2025) Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards

(10 Oct 2025) Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

(8 Oct 2025) Can Speech LLMs Think while Listening?

(5 Oct 2025) Principled and Tractable RL for Reasoning with Diffusion Language Models

(22 Jul 2025) Step-Audio 2 Technical Report

(14 Mar 2025) Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Image/Video Generation

(24 Oct 2025) Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

(15 Oct 2025) Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation

(9 Oct 2025) Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing

(9 Oct 2025) Reinforcing Diffusion Models by Direct Group Preference Optimization

(9 Oct 2025) Real-Time Motion-Controllable Autoregressive Video Diffusion

(29 Sep 2025) STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation

(28 Aug 2025) Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

(28 Aug 2025) OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

(28 Aug 2025) Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

(27 Aug 2025) CVBench: Evaluating Cross-Video Synergies for Complex Multimodal Understanding and Reasoning

(9 Aug 2025) AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning

(28 Jul 2025) Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

(20 Jun 2025) RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought

(17 Jun 2025) SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks

(16 May 2025) Towards Self-Improvement of Diffusion Models via Group Preference Optimization

(16 May 2025) Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

(15 May 2025) Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

(12 May 2025) DanceGRPO: Unleashing GRPO on Visual Generation

(8 May 2025) Flow-GRPO: Training Flow Matching Models via Online RL

(1 May 2025) T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

(22 Apr 2025) From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

(22 Apr 2025) Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning

(26 Mar 2025) MMGen: Unified Multi-modal Image Generation and Understanding in One Go

(13 Mar 2025) GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

(3 Mar 2025) MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation

(23 Jan 2025) Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Bench/Dataset

(15 Oct 2025) Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models

(14 Oct 2025) Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

(10 Oct 2025) BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

(10 Oct 2025) SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

(9 Sep 2025) Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

(27 Aug 2025) 11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis

(8 Aug 2025) MathReal: We Keep It Real! A Real Scene Benchmark for Evaluating Math Reasoning in Multimodal Large Language Models

(8 Aug 2025) InfoCausalQA:Can Models Perform Non-explicit Causal Reasoning Based on Infographic?

(22 Jul 2025) ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering

(22 Jul 2025) Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

(12 Jun 2025) VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

(12 Jun 2025) MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning

(6 Jun 2025) PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts

(5 Jun 2025) VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

(5 Jun 2025) MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark

(15 May 2025) StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation

(13 May 2025) VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

(1 May 2025) MINERVA: Evaluating Complex Video Reasoning

(30 Apr 2025) GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

(21 Apr 2025) IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

(21 Apr 2025) VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

(17 Apr 2025) Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark

(16 Apr 2025) FLIP Reasoning Challenge

(14 Apr 2025) VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge

(8 Apr 2025) ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering

(8 Apr 2025) V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

(8 Apr 2025) MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

(4 Apr 2025) Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

(15 Feb 2025) SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding

(14 Feb 2025) MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

(13 Feb 2025) MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

(18 Dec 2024) Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces.

(22 Nov 2024) VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection. code

(18 Oct 2024) MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps

(7 Jul 2024) VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool

(20 Jun 2024) MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

(12 Jun 2024) LVBench: An Extreme Long Video Understanding Benchmark

(24 Apr 2024) Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

(16 Apr 2024) OpenEQA: Embodied Question Answering in the Era of Foundation Models

(17 Aug 2023) EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

(23 May 2023) Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought.

(18 May 2021) NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

Latent

(29 Sep 2025) Latent Visual Reasoning

(12 Feb 2025) Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

(7 Feb 2025) Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

(9 Dec 2024) Training Large Language Models to Reason in a Continuous Latent Space

Open Source Project

https://github.com/Hui-design/Open-LLaVA-Video-R1

https://github.com/SkyworkAI/Skywork-R1V

https://huggingface.co/papers/2503.05379

https://github.com/Osilly/Vision-R1

https://github.com/ModalMinds/MM-EUREKA

https://github.com/OpenRLHF/OpenRLHF-M

https://github.com/Fancy-MLLM/R1-Onevision

https://github.com/om-ai-lab/VLM-R1

https://github.com/EvolvingLMMs-Lab/open-r1-multimodal

https://github.com/Deep-Agent/R1-V

https://github.com/TideDra/lmm-r1

https://github.com/tulerfeng/Video-R1

https://github.com/Wang-Xiaodong1899/Open-R1-Video

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-Multimodal-Reasoning

⭐ If you find this list useful, welcome to star it!

Paper List (Updating...)

Survey

Image Reasoning

Video

DLLM

Audio

Image/Video Generation

Bench/Dataset

Latent

Open Source Project

About

Uh oh!

Releases

Packages

License

The-Martyr/Awesome-Multimodal-Reasoning

Folders and files

Latest commit

History

Repository files navigation

Awesome-Multimodal-Reasoning

⭐ If you find this list useful, welcome to star it!

Paper List (Updating...)

Survey

Image Reasoning

Video

DLLM

Audio

Image/Video Generation

Bench/Dataset

Latent

Open Source Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages