v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

13 June 2022

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown

Residual Cross-Attention Transformer-Based Multi-User CSI Feedback with Deep Joint Source-Channel CodingIEEE Wireless Communications Letters (WCL), 2025

135

26 May 2025

MLLMs are Deeply Affected by Modality Bias

...

312

24 May 2025

Learning Generalized and Flexible Trajectory Models from Omni-Semantic Supervision

262

23 May 2025

DUAL: Dynamic Uncertainty-Aware Learning

107

21 May 2025

Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review

Xueqiang Ouyang

Jia Wei

419

19 May 2025

Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables

309

18 May 2025

Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

SungHeon Jeong

Jihong Park

Mohsen Imani

411

05 May 2025

Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation LearningIEEE Access (IEEE Access), 2025

444

30 Apr 2025

A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thawIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025

263

23 Apr 2025

OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning

701

22 Apr 2025

DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis

Efthymios Georgiou

Vassilis Katsouros

Yannis Avrithis

Alexandros Potamianos

389

15 Apr 2025

HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues

Xiwen Li

Ross T. Whitaker

Tolga Tasdizen

270

15 Apr 2025

Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical ImagingInternational Journal of Machine Learning and Cybernetics (IJMLC), 2025

215

09 Apr 2025

Foundation Models for Environmental Science: A Survey of Emerging Frontiers

563

05 Apr 2025

ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving

356

04 Apr 2025

FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

214

03 Apr 2025

Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics

201

30 Mar 2025

Quantum Complex-Valued Self-Attention Model

314

24 Mar 2025

Continual Multimodal Contrastive Learning

703

19 Mar 2025

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

364

17 Mar 2025

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

327

14 Mar 2025

Beam Selection in ISAC using Contextual Bandit with Multi-modal Transformer and Transfer Learning

143

13 Mar 2025

FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target ClassificationIEEE Transactions on Aerospace and Electronic Systems (IEEE Trans. Aerosp. Electron. Syst.), 2025

S. Sami

Md Golam Moula Mehedi Hasan

Nasser M. Nasrabadi

Raghuveer Rao

307

12 Mar 2025

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

424

09 Mar 2025

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

535

06 Mar 2025

A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery

333

06 Mar 2025

A Survey of Foundation Models for Environmental SciencePacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025

389

05 Mar 2025

Deep Causal Behavioral Policy Learning: Applications to Healthcare

264

05 Mar 2025

Attention Bootstrapping for Multi-Modal Test-Time AdaptationAAAI Conference on Artificial Intelligence (AAAI), 2025

293

04 Mar 2025

Split Adaptation for Pre-trained Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2025

361

01 Mar 2025

Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving SystemsInternational Conference on Big Data and Smart Computing (BigComp), 2025

Faisal Mohammad

Duksan Ryu

224

28 Feb 2025

What are You Looking at? Modality Contribution in Multimodal Medical Deep LearningInternational Journal of Computer Assisted Radiology and Surgery (IJCARS), 2025

231

28 Feb 2025

Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer InterfacesInformation Fusion (Inf. Fusion), 2025

308

26 Feb 2025

GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular DataAAAI Conference on Artificial Intelligence (AAAI), 2025

Rui Deng

Ziqi Li

Mingshu Wang

346

24 Feb 2025

Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers

467

20 Feb 2025

A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions

323

09 Feb 2025

Fine-grained Graph Rationalization

171

28 Jan 2025

High-dimensional multimodal uncertainty estimation by manifold alignment:Application to 3D right ventricular strain computations

219

21 Jan 2025

Balance-aware Sequence Sampling Makes Multi-modal Learning BetterInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

Zhi-Hao Guan

142

01 Jan 2025

Multimodal Fusion and Coherence Modeling for Video Topic SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

430

31 Dec 2024

Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

955

28 Dec 2024

When SAM2 Meets Video Shadow and Mirror Detection

Leiping Jie

VLM

204

26 Dec 2024

Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

379

19 Dec 2024

Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records

Sanjana Gundapaneni

Zhuo Zhi

Miguel R. D. Rodrigues

293

14 Dec 2024

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

...

421

03 Dec 2024

Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence

Lukas Schulze Balhorn

Kevin Degens

Artur M. Schweidtmann

AI4CE

358

30 Nov 2024

Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning

Marcel A J van Gerven

308

27 Nov 2024

FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval

300

26 Nov 2024

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation LearningACM Computing Surveys (ACM CSUR), 2024

Luis Vilaca

Yi Yu

Paula Vinan

472

24 Nov 2024

Silver medal Solution for Image Matching Challenge 2024

Yian Wang

3DV 3DPC

178

04 Nov 2024