ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey
v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
    ViT
ArXiv (abs)PDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown
Handwritten Text Recognition for Low Resource Languages
Sayantan Dey
Alireza Alaei
P. Roy
VLM
104
0
0
01 Dec 2025
Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
Libo Wang
104
0
0
30 Nov 2025
Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition
Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition
Ayhan Kucukmanisa
Derya Gelmez
Sükrü Selim Çalik
Zeynep Hilal Kilimci
118
0
0
21 Nov 2025
LMM-IR: Large-Scale Netlist-Aware Multimodal Framework for Static IR-Drop Prediction
LMM-IR: Large-Scale Netlist-Aware Multimodal Framework for Static IR-Drop PredictionDesign Automation Conference (DAC), 2025
Kai Ma
Zhen Wang
Hongquan He
Qi Xu
Tinghuan Chen
Hao Geng
64
0
0
16 Nov 2025
Learning Time in Static Classifiers
Learning Time in Static Classifiers
Xi Ding
Lei Wang
Piotr Koniusz
Yongsheng Gao
124
0
0
15 Nov 2025
Point Cloud Quantization through Multimodal Prompting for 3D Understanding
Point Cloud Quantization through Multimodal Prompting for 3D Understanding
Hongxuan Li
Wencheng Zhu
Huiying Xu
Xinzhong Zhu
Q. Hu
MQ3DPC
429
0
0
15 Nov 2025
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
Leyan Xue
Zongbo Han
Kecheng Xue
Xiaohong Liu
Guangyu Wang
C. Zhang
132
0
0
09 Nov 2025
Towards Scalable Meta-Learning of near-optimal Interpretable Models via Synthetic Model Generations
Towards Scalable Meta-Learning of near-optimal Interpretable Models via Synthetic Model Generations
Kyaw Hpone Myint
Zhe Wu
Alexandre G.R. Day
Giri Iyengar
SyDa
417
0
0
06 Nov 2025
Caption Injection for Optimization in Generative Search Engine
Caption Injection for Optimization in Generative Search Engine
Xiaolu Chen
Yong Liao
DiffM
132
0
0
06 Nov 2025
Enhancing Multimodal Reasoning via Latent Refocusing
Enhancing Multimodal Reasoning via Latent Refocusing
Jizheng Ma
Xiaofei Zhou
Yanlong Song
Han Yan
VLMLRM
178
1
0
04 Nov 2025
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein R. Nowdeh
Jie Ji
Xiaolong Ma
Fatemeh Afghah
136
0
0
28 Oct 2025
MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning
MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning
Alejandro Guerra-Manzanares
Farah E. Shamout
128
0
0
20 Oct 2025
Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Ryo Masumura
Shota Orihashi
Mana Ihori
Tomohiro Tanaka
Naoki Makishima
Taiga Yamane
Naotaka Kawata
Satoshi Suzuki
Taichi Katayama
84
0
0
16 Oct 2025
FedMMKT:Co-Enhancing a Server Text-to-Image Model and Client Task Models in Multi-Modal Federated Learning
FedMMKT:Co-Enhancing a Server Text-to-Image Model and Client Task Models in Multi-Modal Federated Learning
Ningxin He
Yang Liu
Wei Sun
Xiaozhou Ye
Ye Ouyang
Tiegang Gao
Z. Zhang
92
0
0
14 Oct 2025
ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning
ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning
Haochen You
Baojing Liu
152
0
0
02 Oct 2025
MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
Payal Mohapatra
Yueyuan Sui
Akash Pandey
Stephen Xia
Qi Zhu
AI4TS
86
1
0
29 Sep 2025
InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
Liangjian Wen
Qun Dai
Jianzhuang Liu
Jiangtao Zheng
Yong Dai
Dongkai Wang
Zhao Kang
Jun Wang
Z. Xu
Jiang Duan
246
0
0
28 Sep 2025
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
M. Raza
A. Azam
Talha Qaiser
Nasir M. Rajpoot
124
1
0
24 Sep 2025
SeMob: Semantic Synthesis for Dynamic Urban Mobility Prediction
SeMob: Semantic Synthesis for Dynamic Urban Mobility Prediction
Runfei Chen
Shuyang Jiang
Wei Huang
93
0
0
24 Sep 2025
Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation
Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
HAIOffRL
124
0
0
23 Sep 2025
Orchestrate, Generate, Reflect: A VLM-Based Multi-Agent Collaboration Framework for Automated Driving Policy Learning
Orchestrate, Generate, Reflect: A VLM-Based Multi-Agent Collaboration Framework for Automated Driving Policy Learning
Zengqi Peng
Yusen Xie
Yubin Wang
Rui Yang
Qifeng Chen
Jun Ma
116
0
0
21 Sep 2025
DAFTED: Decoupled Asymmetric Fusion of Tabular and Echocardiographic Data for Cardiac Hypertension Diagnosis
DAFTED: Decoupled Asymmetric Fusion of Tabular and Echocardiographic Data for Cardiac Hypertension Diagnosis
Jérémie Stym-Popper
Nathan Painchaud
Clément Rambour
P. Courand
Nicolas Thome
Olivier Bernard
143
1
0
19 Sep 2025
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh
Sara Abdali
Yinheng Li
K. Koishida
172
0
0
18 Sep 2025
Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
Jonas Geiger
Marta Moscati
Shah Nawaz
Markus Schedl
VLM
92
0
0
18 Sep 2025
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu
Shengwu Xiong
Jiajun Zhang
Yaxiong Chen
Bowen Zhou
...
Yang Yang
Yanglin Deng
Yashu Kang
Ye Yuan
Y. Wen
LRM
127
1
0
17 Sep 2025
From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification
From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification
M. S. Khorshidi
Navid Yazdanjue
Hassan Gharoun
M. Nikoo
Fang Chen
Amir H. Gandomi
124
1
0
16 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DVVGenAI4TS
237
0
0
11 Sep 2025
IMDMR: An Intelligent Multi-Dimensional Memory Retrieval System for Enhanced Conversational AI
IMDMR: An Intelligent Multi-Dimensional Memory Retrieval System for Enhanced Conversational AI
Tejas Pawar
Sarika Patil
Om Tilekar
Rushikesh Janwade
Vaibhav Helambe
56
0
0
10 Sep 2025
XSRD-Net: EXplainable Stroke Relapse Detection
XSRD-Net: EXplainable Stroke Relapse Detection
Christian Gapp
Elias Tappeiner
M. Welk
Karl Fritscher
Stephanie Mangesius
...
Philipp Deisl
Michael Knoflach
Astrid E. Grams
Elke Ruth Gizewski
R. Schubert
56
0
0
09 Sep 2025
Testing chatbots on the creation of encoders for audio conditioned image generation
Testing chatbots on the creation of encoders for audio conditioned image generation
Jorge E. León
Miguel Carrasco
152
0
0
09 Sep 2025
Effectively obtaining acoustic, visual and textual data from videos
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
135
1
0
06 Sep 2025
AIVA: An AI-based Virtual Companion for Emotion-aware Interaction
AIVA: An AI-based Virtual Companion for Emotion-aware Interaction
Chenxi Li
44
0
0
03 Sep 2025
On Transferring, Merging, and Splitting Task-Oriented Network Digital Twins
On Transferring, Merging, and Splitting Task-Oriented Network Digital Twins
Zifan Zhang
Minghong Fang
Mingzhe Chen
Yuchen Liu
81
0
0
02 Sep 2025
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
Lianyu Hu
Fanhua Shang
Wei Feng
Liang Wan
MLLMVLM
132
0
0
30 Aug 2025
A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension
A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic ComprehensionInformation Fusion (Inf. Fusion), 2025
Mohammad Zia Ur Rehman
Devraj Raghuvanshi
Umang Jain
Shubhi Bansal
Nagendra Kumar
108
5
0
22 Aug 2025
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs
Ruyi Ding
Tianhong Xu
Xinyi Shen
A. A. Ding
Yunsi Fei
MoEAAML
140
2
0
20 Aug 2025
Separating Shared and Domain-Specific LoRAs for Multi-Domain Learning
Separating Shared and Domain-Specific LoRAs for Multi-Domain Learning
Yusaku Takama
Ning Ding
Tatsuya Yokota
Toru Tamaki
154
0
0
05 Aug 2025
Parameter-Efficient Single Collaborative Branch for Recommendation
Parameter-Efficient Single Collaborative Branch for RecommendationACM Conference on Recommender Systems (RecSys), 2025
Marta Moscati
Shah Nawaz
Markus Schedl
BDL
157
0
0
05 Aug 2025
Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence
Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence
Matthieu Queloz
138
2
0
29 Jul 2025
T$^\text{3}$SVFND: Towards an Evolving Fake News Detector for Emergencies with Test-time Training on Short Video Platforms
T3^\text{3}3SVFND: Towards an Evolving Fake News Detector for Emergencies with Test-time Training on Short Video Platforms
Liyuan Zhang
Zeyun Cheng
Yan Yang
Yong Liu
Jinke Ma
135
0
0
27 Jul 2025
Principled Multimodal Representation Learning
Principled Multimodal Representation Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
219
6
0
23 Jul 2025
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics
Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics
Asifullah Khan
Muhammad Zaeem Khan
Saleha Jamshed
Sadia Ahmad
Aleesha Zainab
Kaynat Khatib
Faria Bibi
Abdul Rehman
OffRLLRM
269
3
0
14 Jun 2025
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
Bo-Cheng Chiu
Jen-Jee Chen
Yu-Chee Tseng
Feng-Chi Chen
317
0
0
13 Jun 2025
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu
OffRL
294
1
0
13 Jun 2025
Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation
Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation
John Waithaka
Moise Busogi
SSL
160
0
0
07 Jun 2025
SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
Zhenyu Yu
Mohd Yamani Idna Idris
Pei Wang
Yuelong Xia
Fei Ma
Rizwan Qureshi
174
6
0
06 Jun 2025
CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's Detection
CogniAlign: Word-Level Multimodal Speech Alignment with Gated Cross-Attention for Alzheimer's DetectionKnowledge-Based Systems (KBS), 2025
David Ortiz-Perez
Manuel Benavent-Lledo
Javier Rodriguez-Juan
José García Rodríguez
David Tomás
307
5
0
02 Jun 2025
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Yuyuan Liu
Yuanhong Chen
Chong Wang
Junlin Han
Junde Wu
Can Peng
Jingkun Chen
Yu Tian
Gustavo Carneiro
VLM
299
0
0
01 Jun 2025
Revisiting Self-attention for Cross-domain Sequential Recommendation
Revisiting Self-attention for Cross-domain Sequential RecommendationKnowledge Discovery and Data Mining (KDD), 2025
Clark Mingxuan Ju
Leonardo Neves
Bhuvesh Kumar
Liam Collins
Tong Zhao
Yuwei Qiu
Qing Dou
Sohail Nizam
Sen Yang
Neil Shah
LRM
184
5
0
27 May 2025
Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel Coding
Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel CodingIEEE Wireless Communications Letters (WCL), 2025
Hengwei Zhang
Minghui Wu
Li Qiao
Ling Liu
Ziqi Han
Zhen Gao
135
2
0
26 May 2025
1234567
Next