This repository collects several papers related to multimodal recommendation systems and annotated the key technologies used in some papers.
KeyWords: Multi-Modal Recommendation, Multimodal Recommendation, Micro-video Recommendation, Multimedia Recommendation
arXiv(2025)
Joint Modeling in Recommendations: A Survey[PDF]Springer(2025)
Multimodal Learning toward Recommendation[Book]KDD(2024)
Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey. [Survey] [PDF]TORS(2024)
Multimodal Pre-training for Sequential Recommendation via Contrastive Learning [Survey] [PDF]CSUR(2024)
Multimodal Recommender Systems: A Survey [Survey] [PDF]arXiv (2025)
A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions [Survey] [PDF]TORS (2025)
Formalizing Multimedia Recommendation through Multimodal Deep Learning [Survey] [PDF]
arXiv (2025)
Transferable Sequential Recommendation with Vanilla Cross-Entropy Loss. [Mamba] [PDF]TCSS (2025)
DDRec: Dual Denoising Multimodal Graph Recommendation. [GNN+CL] [PDF] [CODE]Information Fusion (2025)
PEARL: A dual-layer graph learning for multimodal recommendation. [GNN+CL] [PDF] [CODE]arXiv (2025)
Training-Free Graph Filtering via Multimodal Feature Refinement for Extremely Fast Multimodal Recommendation. [Graph filtering] [PDF]arXiv (2025)
Bridging Domain Gaps between Pretrained Multimodal Models and Recommendations. [Knowledge Transfer] [PDF]arXiv (2025)
Explainable Multi-Modality Alignment for Transferable Recommendation. [Multi-modality alignment] [PDF]arXiv (2025)
NoteLLM-2: Multimodal Large Representation Models for Recommendation. [Multimodal Large Language Model] [PDF] [CODE]MM (2024)
Modality-Balanced Learning for Multimedia Recommendation. [Knowledge Distillation] [PDF] [CODE]MM (2024)
DiffMM: Multi-Modal Diffusion Model for Recommendation. [Multi-modal+Diffusion model] [PDF] [CODE]CIKM (2024)
AlignRec: Aligning and Training in Multimodal Recommendations. [Multi-modal Alignment] [PDF] [CODE]CIKM (2024)
Spectral and Geometric Spaces Representation Regularization for Multi-Modal Sequential Recommendation. [Representation Optimization] [CODE]CIKM (2024)
GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation. [GNN+Multimodality Correlation] [PDF] [CODE]CIKM (2024)
Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?. [GNN+Missing Modalities] [PDF] [CODE]TMM(2024)
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability. [GNN+VAE] [PDF] [CODE]TORS(2024)
Formalizing Multimedia Recommendation through Multimodal Deep Learning. [Multimodal Deep Learning] [PDF] [CODE]SIGIR(2024)
Multimodality Invariant Learning for Multimedia-Based New Item Recommendation. [Invariant Learning] [CODE]SIGIR(2024)
Disentangling ID and Modality Effects for Session-based Recommendation. [Disentangled Representation] [PDF] [CODE]SIGIR(2024)
Who To Align With: Feedback-Oriented Multi-Modal Alignment in Recommendation Systems. [Multi-modal Alignment] [CODE]SIGIR(2024)
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT. [PEFT] [PDF] [CODE]SIGIR(2024)
EEG-SVRec: An EEG Dataset with User Multidimensional Affective Engagement Labels in Short Video Recommendation. [Dataset+Multimodal] [PDF] [CODE]SIGIR(2024)
Dataset and Models for Item Recommendation Using Multi-Modal User Interactions. [Multi-modal Interaction] [PDF] [CODE]WWW(2024)
PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning. [Knowledge Distillation+Prompt-Tuning] [PDF] [CODE]WWW(2024)
Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima. [Robust Optimization] [PDF] [CODE]WWW(2024)
MMPOI: A Multi-Modal Content-Aware Framework for POI Recommendations. [Content-Aware Framework] [CODE]KDD(2024)
Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback. [Denoising+Alignment] [PDF] [CODE]KDD(2024)
MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion. [Multi-modal Fusion+Behavior Expansion] [PDF]RecSys(2024)
A Multi-modal Modeling Framework for Cold-start Short-video Recommendation. [Cold-start+Multi-modal]AAAI(2024)
[LGMRec: Local and Global Graph Learning for Multimodal Recommendation]. [GNN] [PDF] [CODE]KBS(2023)
A holistic view on positive and negative implicit feedback for micro-video recommendation. [RNN+GNN] [CODE]IJCNN(2023)
A Multi-modal Multi-task based Approach for Movie Recommendation. [Multi-task Learning]MM(2023)
A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. [GNN] [PDF] [CODE]CIKM(2023)
Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems. [GNN] [PDF] [CODE]SIGIR (2023)
Attention-guided Multi-step Fusion: A Hierarchical Fusion Network for Multimodal Recommendation. [CNN+Transformer+GCN] [PDF]TKDE(2023)
Beyond Co-occurrence: Multi-modal Session-based Recommendation. [CL+Transformer] [PDF] [CODE]WWW(2023)
Bootstrap Latent Representations for Multi-modal Recommendation. [CL+GNN] [PDF] [CODE]Applied Soft Computing(2023)
Collaborative Recommendation Model Based on Multi-modal Multi-view Attention Network: Movie and literature cases . [KG+BERT+GAT] [PDF]MM(2023)
Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation. [CL]ICMR(2023)
Cross-View Sample-Enriched Graph Contrastive Learning Network for Personalized Micro-video Recommendation. [GCL]TMM(2023)
Disentangled Multimodal Representation Learning for Recommendation. [Disentangled Representation Learning] [PDF]TOIS(2023)
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation. [ Meta-Learning] [CODE]MULTIMEDIA SYSTEMS(2023)
Empowering neural collaborative filtering with contextual features for multimedia recommendation.MM(2023)
Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing.ECAI(2023)
Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation. [GNN] [PDF] [CODE]SAC(2023)
Graph Convolutional Neural Network for Multimodal Movie Recommendation. [GCN]SIGIR(2023)
Learning fine-grained user interests for micro-video recommendation. [CODE]TOMM(2023)
Learning the User's Deeper Preferences for Multi-modal Recommendation Systems. [GCN]SIGIR(2023)
LightGT: A Light Graph Transformer for Multimedia Recommendation. [GCN+Transformer ] [CODE]TOIS(2023)
MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. [GCN] [CODE]WWW (2023)
MEMER - Multimodal Encoder for Multi-signal Early-stage Recommendations.ESWA(2023)
Meta-path based graph contrastive learning for micro-video recommendation. [GNN+CL]Multimedia Tools and Applications(2023)
Micro video recommendation in multimodality using dual-perception and gated recurrent graph neural network. [GRU+GNN]SIGIR(2023)
Mining Stable Preferences: Adaptive Modality Decorrelation for Multimedia Recommendation. [Stable Learning]MM(2023)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation. [Transformer] [PDF]Electronics(2023)
MKGCN: Multi-Modal Knowledge Graph Convolutional Network for Music Recommender Systems. [KG+GCN] [CODE]TKDE(2023)
MM-FRec: Multi-Modal Enhanced Fashion Item Recommendation. [GNN]WWW (2023)
MMMLP: Multi-modal Multilayer Perceptron for Sequential Recommendations. [MLP-Mixer] [CODE]MM(2023)
Modal-aware Bias Constrained Contrastive Learning for Multimodal Recommendation. [CL]KBS(2023)
Multi-Head multimodal deep interest recommendation network. [DIN] [PDF]Applied Intelligence(2023)
Multimodal collaborative graph for image recommendation. [GNN]CSS(2023)
Multimodal Contrastive Transformer for Explainable Recommendation. [Transformer]TMM(2023)
Multimodal Graph Contrastive Learning for Multimedia-Based Recommendation. [GCL] [CODE]ISDA(2023)
Multi-modal Knowledge Graph Convolutional Network for Recommendation. [KG+GNN]CIKM(2023)
Multi-modal Mixture of Experts Represetation Learning for Sequential Recommendation. [MoE+SA] [CODE]Mathematics(2023)
Multimodal movie recommendation system using deep learning. [DNN]arXiv(2023)
Multimodal Pre-training Framework for Sequential Recommendation via Contrastive Learning . [CNN]PLOS ONE(2023)
Multi-modal recommendation algorithm fusing visual and textual features. [Attention Mechanism]WWW (2023)
Multi-Modal Self-Supervised Learning for Recommendation. [SSL] [PDF] [CODE]MM(2023)
Multi-View Graph Convolutional Network for Multimedia Recommendation. [GCN] [PDF] [CODE]MM(2023)
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation. [Transformer+Distillation] [PDF] [CODE]MM(2023)
Pareto Invariant Representation Learning for Multimedia Recommendation. [Invariant Learning] [PDF]MM(2023)
Prior-Guided Accuracy-Bias Tradeoff Learning for CTR Prediction in Multimedia Recommendation .Information Fusion(2023)
Prompt-based and weak-modality enhanced multimodal recommendation. [CODE]MM(2023)
Semantic-Guided Feature Distillation for Multimodal Recommendation.. [Distillation Learning] [PDF] [CODE]MM(2023)
Task-Adversarial Adaptation for Multi-modal Recommendation. [Multi-task learning] [PDF] [CODE]EMNLP(2023)
VIP5: Towards Multimodal Foundation Models for Recommendation. [P5 recommendation paradigm] [PDF] [CODE]arXiv(2023)
A Content-Driven Micro-Video Recommendation Dataset at Scale. [PDF] [CODE]MM (2022)
Adaptive Anti-Bottleneck Multi-Modal Graph Learning Network for Personalized Micro-video Recommendation. [GNN]NEUROCOMPUTING(2022)
Binary multi-modal matrix factorization for fast item cold-start recommendation. [CODE]MM(2022)
Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation. [GNN] [CODE]MM (2022)
EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation. [GNN] [CODE]MM (2022)
From Abstract to Details: A Generative Multimodal Fusion Framework for Recommendation. [Generative Learning]SMC(2022)
Graph Network based Approaches for Multi-modal Movie Recommendation System. [GNN]TMM(2022)
Heterogeneous Graph Contrastive Learning Network for Personalized Micro-Video Recommendation. [GCL]arXiv(2022)
Implicit semantic-based personalized micro-videos recommendation.MM(2022)
Invariant Representation Learning for Multimedia Recommendation. [Invariant Learning] [CODE]TKDE(2022)
Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation. [GNN] [PDF] [CODE]MM(2022)
Learning Hybrid Behavior Patterns for Multimedia Recommendation. [GNN]DASFAA(2022)
M-3-IB: A Memory-Augment Multi-modal Information Bottleneck Model for Next-Item Recommendation. [CODE]ICME(2022)
M3Rec: Cross-Modal Context Enhanced Micro-Video Recommendation with Mutual Information Maximization. [GNN]CIKM(2022)
MARIO: Modality-Aware Attention and Modality-Preserving Decoders for Multimedia Recommendation.SIGIR(2022)
MM-rec: Visiolinguistic model empowered multimodal news recommendation. [Transformer]ICMR(2022)
Multi-modal contrastive pre-training for recommendation. [GNN+CL]ISMIS(2022)
Multimodal Deep Learning and Fast Retrieval for Recommendation.CCET(2022)
Multi-modal graph attention network for video recommendation. [GNN]SIGIR(2022)
Multi-modal Graph Contrastive Learning for Micro-video Recommendation. [GCL] [PDF]TCSS(2022)
Multimodal Hierarchical Graph Collaborative Filtering for Multimedia-Based Recommendation. [GNN] [CODE]CIKM(2022)
Multimodal meta-learning for cold-start sequential recommendation. [ Meta Learning] [CODE]SACAIR(2022)
Multi-modal Recommendation System with Auxiliary Information. [Transformer]SIGIR(2022)
Next point-of-interest recommendation with auto-correlation enhanced multi-modal transformer network. [Transformer]APPLIED INTELLIGENCE(2022)
Preference-corrected multimodal graph convolutional recommendation network. [GCN]APPLIED INTELLIGENCE(2022)
Robust multi-objective visual bayesian personalized ranking for multimedia recommendation.TMM(2022)
Self-supervised Learning for Multimedia Recommendation. [SSL] [CODE]IJCNN(2022)
Towards Developing a Multi-Modal Video Recommendation System. [KG] [CODE]AI-HCI(2021)
A deep learning based multi-modal approach for images and texts recommendation. [DBN]Information Sciences(2021)
A two-stage embedding model for recommendation with multimodal auxiliary information. [GCN]TMM(2021)
DualGNN: Dual Graph Neural Network for Multimedia Recommendation. [GNN] [CODE]ISM(2021)
Enhancing Personalised Recommendations with the Use of Multimodal Information.TMM(2021)
Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-Video Recommendation. [GNN]TMM(2021)
Hierarchical User Intent Graph Network for Multimedia Recommendation. [GCN] [CODE]INTERNET RESEARCH(2021)
Interpretable video tag recommendation with multimedia deep learning framework. [CNN]MM(2021)
Mining Latent Structures for Multimedia Recommendation. [GCN] [PDF] [CODE]arXiv(2021)
MM-Rec: Multimodal News Recommendation. [Transformer]MTA(2021)
Multimedia recommendation using Word2Vec-based social relationship mining. [Word2Vec]TKDE(2021)
Multi-Modal Discrete Collaborative Filtering for Efficient Cold-Start Recommendation.ICME(2021)
Multimodal Disentangled Representation for Recommendation.BigMM(2021)
Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation. [Disentangled Representation Learning]KBS(2021)
Multi-modal knowledge-aware reinforcement learning network for explainable recommendation. [KG]International Journal of Information Technology(2021)
Multimodal trust based recommender system with machine learning approaches for movie recommendation. [BPNN+SVM+DNN]TMM(2021)
Multi-Modal Variational Graph Auto-Encoder for Recommendation Systems. [GCN+VAE]Information Sciences(2021)
Multi-modal visual adversarial Bayesian personalized ranking model for recommendation. [Adversarial Learning]PRL(2021)
Multimodal-adaptive hierarchical network for multimedia sequential recommendation. [LSTM]IEEE BIG DATA(2021)
Object Interaction Recommendation with Multi-Modal Attention-based Hierarchical Graph Neural Network. [Attention Mechanism+Transformer+GNN] [CODE]APPLIED SCIENCES(2021)
Predicting Implicit User Preferences with Multimodal Feature Fusion for Similar User Recommendation in Social Media. [Autoencoder+CNN]MM (2021)
Pre-training Graph Transformer with Multimodal Side Information for Recommendation. [ Graph Transformer] [PDF]KDD(2021)
SEMI: A Sequential Multi-Modal Information Transfer Network for E-Commerce Micro-Video Recommendations. [CL]ESWA(2021)
Session-based news recommendations using SimRank on multi-modal graphs. [GNN]GLOBECOM(2021)
Social Recommendation System with Multimodal Collaborative Filtering. [GNN+LSTM]arXiv(2021)
Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation. [Transformer]Chaos, Solitons & Fractals(2021)
BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer. [Transformer] [PDF]TCSS(2020)
AMNN: Attention-Based Multimodal Neural Network Model for Hashtag Recommendation. [Attention mechanism] [CODE]IEEE ACCESS(2020)
An Enhanced Multi-Modal Recommendation Based on Alternate Training With Knowledge Graph Representation. [KG]IEEE BIG DATA(2020)
Complementary Recommendations Using Deep Multi-modal Embeddings For Online Retail. [LSTM]TMM(2020)
Context-Dependent Propagating-Based Video Recommendation in Multimodal Heterogeneous Information Networks. [GNN]IJCNN(2020)
Enhancing Music Recommendation with Social Media Content: an Attentive Multimodal Autoencoder Approach. [Attention mechanism+Autoencoder]MM(2020)
Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. [GNN] [PDF] [CODE]KBS(2020)
Hashtag our stories: Hashtag recommendation for micro-videos via harnessing multiple modalities. [DNN]TMM(2020)
Learning and fusing multiple user interest representations for micro-video and movie recommendations. [Attention mechanism] [CODE]IPM(2020)
MGAT: Multimodal Graph Attention Network for Recommendation. [GAT] [CODE]FGCS(2020)
Multi-modal Bayesian embedding for point-of-interest recommendation on location-based cyber-physical–social networks.CIKM(2020)
Multi-modal Knowledge Graphs for Recommender Systems. [KG] [PDF]arXiv (2020)
Multimodal Topic Learning for Video Recommendation. [CNN+Transformer+GNN]KDD(2020)
PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest. [PDF]TII(2020)
Recommendation by Users’ Multimodal Preferences for Smart City Applications.APPLIED SCIENCES-BASEL(2020)
Recommendations for Different Tasks Based on the Uniform Multimodal Joint Representation.IEEE ACCESS(2020)
Sentiment Enhanced Multi-Modal Hashtag Recommendation for Micro-Videos. [SA]IEEE(2020)
Sequential Recommendation with a Pre-trained Module Learning Multi-modal Information. [SA]DSN-W(2020)
TAaMR: Targeted Adversarial Attack against Multimedia Recommender Systems. [Adversarial Learning] [CODE]ICME(2020)
User Conditional Hashtag Recommendation for Micro-Videos. [Attention network]MM(2020)
What Aspect Do You Like: Multi-scale Time-aware User Interest Modeling for Micro-video Recommendation. [Attention Network]arXiv(2019)
Interest-Related Item Similarity Model Based on Multimodel Data for Top-N Recommendation. [PDF]CIKM(2019)
Long-tail hashtag recommendation for micro-videos with graph convolutional network. [GCN]MM(2019)
MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video. [GCN] [CODE]DSE(2019)
MMM: Multi-source Multi-net Micro-video Recommendation with Clustered Hidden Item Representation Learning. [CNN+PMF]IOTJ(2019)
Multimodal Representation Learning for Recommendation in Internet of Things. [CNN]ICMEW(2019)
Multi-modal Representation Learning for Short Video Understanding and Recommendation. [MLP]DASFAA(2019)
Multi-source Multi-net Micro-video Recommendation with Hidden Item Category Discovery. [CNN+PMF]SIGIR(2019)
Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network. [Attention Network]MM(2019)
Personalized Hashtag Recommendation for Micro-videos. [GNN] [PDF] [CODE]ICIG(2019)
Personalized Micro-video Recommendation Based on Multi-modal Features and User Interest Evolution. [GRU+Attention Mechanism+MLP]ICME(2019)
Sequential Behavior Modeling for Next Micro-Video Recommendation with Collaborative Transformer. [Transformer]MM(2019)
User Diverse Preference Modeling by Multimodal Attentive Metric Learning. [Attention Mechanism+Metric Learning] [PDF] [CODE]WWW(2019)
User-Video Co-Attention Network for Personalized Micro-video Recommendation. [Attention Mechanism]MM(2019)
Vision-Language Recommendation via Attribute Augmented Multimodal Reinforcement Learning. [Reinforcement Learning]MTA(2018)
LGA: latent genre aware micro-video recommendation on social media. [CNN+BRNN]UMAP(2018)
Multi-modal adversarial autoencoders for recommendations of citations and subject labels. [Adversarial Autoencoder] [PDF] [CODE]IJCAI(2017)
Hashtag Recommendation for Multimodal Microblog Using Co-Attention Network.. [Attention Mechanism+Metric Learning]DLRS(2017)
A deep multimodal approach for cold-start music recommendation. [CNN] [PDF] [CODE]SIGIR(2017)
Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. [Attention Mechanism+Metric Learning] [PDF]ESWA(2017)
Preference dynamics with multimodal user-item interactions in social media recommendation. [MF]TMM(2017)
Social-aware movie recommendation via multimodal network learning. [Metric Learning]NIPS(2013)
Deep content-based music recommendation. [CNN] [PDF]
KDD(2019)
DAML: Dual Attention Mutual Learning between Ratings and Reviews for Item Recommendation. [Neural Network + Attention Mechanism]WSDM(2019)
Gated Attentive-Autoencoder for Content-Aware Recommendation. [Gated Attentive-Autoencoder] [PDF] [CODE]WWW(2018)
Neural Attentional Rating Regression with Review-level Explanations. [Attention Mechanism] [PDF]KDD(2018)
Multi-Pointer Co-Attention Networks for Recommendation. [Co-Attention Networks ] [PDF]WSDM(2017)
Joint Deep Modeling of Users and Items Using Reviews for Recommendation. [Neural Network] [PDF]RecSys(2016)
Convolutional Matrix Factorization for Document Context-Aware Recommendation. [CNN] [PDF] [CODE]IJCAI(2016)
Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction. [Topic Model] [PDF]RecSys(2013)
Hidden factors and hidden topics: understanding rating dimensions with review text. [Topic Model] [PDF]KDD(2011)
Collaborative topic modeling for recommending scientific articles. [Topic Model] [PDF]
TKDE(2022)
Modeling Product’s Visual and Functional Characteristics for Recommender Systems. [MF] [PDF] [CODE]TKDE(2020)
Adversarial Training Towards Robust Multimedia Recommender System. [Adversarial Learning] [CODE]Information Sciences(2019)
Visual appearance or functional complementarity: Which aspect affects your decision making? [MF] [CODE]AAAI(2016)
VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. [BPR] [PDF]