ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey
v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
    ViT
ArXiv (abs)PDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown
Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel Coding
Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel CodingIEEE Wireless Communications Letters (WCL), 2025
Hengwei Zhang
Minghui Wu
Li Qiao
Ling Liu
Ziqi Han
Zhen Gao
135
2
0
26 May 2025
MLLMs are Deeply Affected by Modality Bias
MLLMs are Deeply Affected by Modality Bias
Xu Zheng
Chenfei Liao
Yuqian Fu
Kaiyu Lei
Yuanhuiyi Lyu
...
Yu Jiang
Andrii Zadaianchuk
Dacheng Tao
Luc Van Gool
Xuming Hu
312
11
0
24 May 2025
Learning Generalized and Flexible Trajectory Models from Omni-Semantic Supervision
Learning Generalized and Flexible Trajectory Models from Omni-Semantic Supervision
Yuanshao Zhu
James Jianqiao Yu
Xiangyu Zhao
Xiao Han
Qidong Liu
Xuetao Wei
Yuxuan Liang
262
0
0
23 May 2025
DUAL: Dynamic Uncertainty-Aware Learning
DUAL: Dynamic Uncertainty-Aware Learning
Jiahao Qin
Bei Peng
Feng Liu
Guangliang Cheng
Lu Zong
107
0
0
21 May 2025
Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review
Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review
Xueqiang Ouyang
Jia Wei
419
0
0
19 May 2025
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Yu Gui
Cong Ma
Zongming Ma
SSL
309
2
0
18 May 2025
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
SungHeon Jeong
Jihong Park
Mohsen Imani
411
0
0
05 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation LearningIEEE Access (IEEE Access), 2025
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIPVLM
444
1
0
30 Apr 2025
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thawIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Zhining Gu
Yili Yang
Brendan M. Rogers
A. Liljedahl
263
4
0
23 Apr 2025
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
Anirudhan Badrinath
Alex Yang
Kousik Rajesh
Prabhat Agarwal
Jaewon Yang
Haoyu Chen
Jiajing Xu
Charles R. Rosenberg
AI4TS
701
3
0
22 Apr 2025
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
Efthymios Georgiou
Vassilis Katsouros
Yannis Avrithis
Alexandros Potamianos
389
1
0
15 Apr 2025
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues
Xiwen Li
Ross T. Whitaker
Tolga Tasdizen
270
0
0
15 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical ImagingInternational Journal of Machine Learning and Cybernetics (IJMLC), 2025
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Chen Tang
MedIm
215
4
0
09 Apr 2025
Foundation Models for Environmental Science: A Survey of Emerging Frontiers
Foundation Models for Environmental Science: A Survey of Emerging Frontiers
Runlong Yu
Shengyu Chen
Yiqun Xie
Huaxiu Yao
J. Willard
X. Jia
AI4CE
563
7
0
05 Apr 2025
ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving
ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving
Sheng Yang
Tong Zhan
Shichen Qiao
Jicheng Gong
Qing Yang
Jian Wang
Yanfeng Lu
3DPC
356
4
0
04 Apr 2025
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
Huangliang Dai
Shixun Wu
Hairui Zhao
Zizhe Jian
Yue Zhu
Haiyang Hu
Haiyang Hu
214
8
0
03 Apr 2025
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Jing Zhu
Mingxuan Ju
Yozen Liu
Danai Koutra
Neil Shah
Tong Zhao
201
3
0
30 Mar 2025
Quantum Complex-Valued Self-Attention Model
Quantum Complex-Valued Self-Attention Model
Fu Chen
Qinglin Zhao
Li Feng
Longfei Tang
Yangbin Lin
Haitao Huang
MQ
314
2
0
24 Mar 2025
Continual Multimodal Contrastive Learning
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
703
8
0
19 Mar 2025
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Ding Wang
Botian Shi
364
12
0
17 Mar 2025
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Chengxuan Qian
Shuo Xing
Shawn Li
Yue Zhao
Zhengzhong Tu
327
11
0
14 Mar 2025
Beam Selection in ISAC using Contextual Bandit with Multi-modal Transformer and Transfer Learning
Mohammad Farzanullah
Han Zhang
A. B. Sediq
Ali Afana
Melike Erol-Kantarci
143
2
0
13 Mar 2025
FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target ClassificationIEEE Transactions on Aerospace and Electronic Systems (IEEE Trans. Aerosp. Electron. Syst.), 2025
S. Sami
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
Raghuveer Rao
307
1
0
12 Mar 2025
DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian
Kai Han
Jing Wang
Chongwen Lyu
Rui Qian
Chongwen Lyu
Zhenlong Yuan
Zhe Liu
Zhe-Yu Liu
424
17
0
09 Mar 2025
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu
Na Zhao
Gang Niu
Masashi Sugiyama
Xiaofeng Zhu
535
1
0
06 Mar 2025
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Yiheng Zhu
Mingyang Li
Junlong Liu
Kun Fu
Jian Wu
Yue Liu
Mingze Yin
Jieping Ye
Jian Wu
Xiping Hu
333
0
0
06 Mar 2025
A Survey of Foundation Models for Environmental SciencePacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Runlong Yu
Shengyu Chen
Yiqun Xie
X. Jia
AI4CE
389
4
0
05 Mar 2025
Deep Causal Behavioral Policy Learning: Applications to Healthcare
Jonas Knecht
Anna Zink
Jonathan Kolstad
Maya Petersen
CML
264
0
0
05 Mar 2025
Attention Bootstrapping for Multi-Modal Test-Time AdaptationAAAI Conference on Artificial Intelligence (AAAI), 2025
Yusheng Zhao
Junyu Luo
Xiao Luo
Jinsheng Huang
Jingyang Yuan
Zhiping Xiao
Min Zhang
TTA
293
2
0
04 Mar 2025
Split Adaptation for Pre-trained Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2025
Lixu Wang
Bingqi Shang
Yuchen Ren
Payal Mohapatra
Wei Dong
Xiao-Xu Wang
Qi Zhu
ViT
361
2
0
01 Mar 2025
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving SystemsInternational Conference on Big Data and Smart Computing (BigComp), 2025
Faisal Mohammad
Duksan Ryu
224
0
0
28 Feb 2025
What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning
What are You Looking at? Modality Contribution in Multimodal Medical Deep LearningInternational Journal of Computer Assisted Radiology and Surgery (IJCARS), 2025
Christian Gapp
Elias Tappeiner
M. Welk
Karl Fritscher
Elke Ruth Gizewski
R. Schubert
231
1
0
28 Feb 2025
Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces
Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer InterfacesInformation Fusion (Inf. Fusion), 2025
Jing Wang
Weishan Ye
Jialin He
Li Zhang
G. Huang
Zhuliang Yu
Zhen Liang
308
3
0
26 Feb 2025
GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular Data
GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular DataAAAI Conference on Artificial Intelligence (AAAI), 2025
Rui Deng
Ziqi Li
Mingshu Wang
346
1
0
24 Feb 2025
Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers
Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
467
1
0
20 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
323
2
0
09 Feb 2025
Fine-grained Graph Rationalization
Fine-grained Graph Rationalization
Zhe Xu
Menghai Pan
Yuzhong Chen
Huiyuan Chen
Yuchen Yan
Mahashweta Das
Hanghang Tong
171
0
0
28 Jan 2025
High-dimensional multimodal uncertainty estimation by manifold alignment:Application to 3D right ventricular strain computations
High-dimensional multimodal uncertainty estimation by manifold alignment:Application to 3D right ventricular strain computations
Maxime Di Folco
Gabriel Bernardino
Patrick Clarysse
Nicolas Duchateau
219
1
0
21 Jan 2025
Balance-aware Sequence Sampling Makes Multi-modal Learning Better
Balance-aware Sequence Sampling Makes Multi-modal Learning BetterInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Zhi-Hao Guan
142
0
0
01 Jan 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Multimodal Fusion and Coherence Modeling for Video Topic SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
430
0
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
955
31
0
28 Dec 2024
When SAM2 Meets Video Shadow and Mirror Detection
When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie
VLM
204
1
0
26 Dec 2024
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Zhiqiang Tang
Zihan Zhong
Tong He
Gerald Friedland
379
4
0
19 Dec 2024
Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest
  X-ray Images and Electronic Health Records
Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records
Sanjana Gundapaneni
Zhuo Zhi
Miguel R. D. Rodrigues
293
1
0
14 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
Shijie Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
421
48
0
03 Dec 2024
Graph-to-SFILES: Control structure prediction from process topologies
  using generative artificial intelligence
Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence
Lukas Schulze Balhorn
Kevin Degens
Artur M. Schweidtmann
AI4CE
358
4
0
30 Nov 2024
Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning
Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning
Melda Yeghaian
Zuhir Bodalal
Daan van den Broek
John B A G Haanen
Regina G H Beets-Tan
Stefano Trebeschi
Marcel A J van Gerven
308
2
0
27 Nov 2024
FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot
  Cross-modal Retrieval
FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval
Jingyou Xie
Jiayi Kuang
Zhenzhou Lin
Jiarui Ouyang
Zishuo Zhao
Ying Shen
VLMCLIP
300
0
0
26 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation LearningACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
472
3
0
24 Nov 2024
Silver medal Solution for Image Matching Challenge 2024
Silver medal Solution for Image Matching Challenge 2024
Yian Wang
3DV3DPC
178
0
0
04 Nov 2024
Previous
1234567
Next