ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey
v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
    ViT
ArXiv (abs)PDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown
JEMA: A Joint Embedding Framework for Scalable Co-Learning with
  Multimodal Alignment
JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment
Joao Sousa
Roya Darabi
A. A. Sousa
Frank Brueckner
Luís Paulo Reis
Ana Reis
219
2
0
31 Oct 2024
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual ContextComputer Vision and Image Understanding (CVIU), 2024
Manuel Benavent-Lledo
David Mulero-Pérez
David Ortiz-Perez
José García Rodríguez
Antonis Argyros
320
3
0
28 Oct 2024
Deep Optimizer States: Towards Scalable Training of Transformer Models
  Using Interleaved Offloading
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved OffloadingInternational Middleware Conference (Middleware), 2024
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
175
7
0
26 Oct 2024
Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Christos Xypolopoulos
Guokan Shang
Xiao Fei
Giannis Nikolentzos
Hadi Abdine
Iakovos Evdaimon
Michail Chatzianastasis
Giorgos Stamou
Michalis Vazirgiannis
311
5
0
25 Oct 2024
FedBaF: Federated Learning Aggregation Biased by a Foundation Model
FedBaF: Federated Learning Aggregation Biased by a Foundation ModelInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Jong-Ik Park
Srinivasa Pranav
J. M. F. Moura
Carlee Joe-Wong
AI4CE
402
4
0
24 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
236
3
0
24 Oct 2024
Multi-Modal Transformer and Reinforcement Learning-based Beam Management
Multi-Modal Transformer and Reinforcement Learning-based Beam ManagementIEEE Networking Letters (IEEE Netw. Lett.), 2024
Mohammad Ghassemi
Han Zhang
Ali Afana
A. B. Sediq
Melike Erol-Kantarci
OffRL
139
12
0
22 Oct 2024
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge DistillationACM Multimedia (MM), 2024
Andong Lu
Jiacong Zhao
Chenglong Li
Yun Xiao
Bin Luo
253
14
0
15 Oct 2024
Investigating Human-Computer Interaction and Visual Comprehension in
  Text Generation Process of Natural Language Generation Models
Investigating Human-Computer Interaction and Visual Comprehension in Text Generation Process of Natural Language Generation Models
Yunchao Wang
Zihang Fu
Chaoqing Xu
Guodao Sun
Ronghua Liang
148
0
0
11 Oct 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A
  Comprehensive Survey
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan Yu
Tianxiao Li
Yuxin Zhu
Rongze Pan
258
5
0
10 Oct 2024
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Dianzhi Yu
Xinni Zhang
Yankai Chen
Aiwei Liu
Yifei Zhang
Philip S. Yu
Irwin King
VLMCLL
355
30
0
07 Oct 2024
Fine-Grained Prediction of Reading Comprehension from Eye Movements
Fine-Grained Prediction of Reading Comprehension from Eye MovementsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Omer Shubi
Yoav Meiri
Cfir Avraham Hadar
Yevgeni Berzak
155
8
0
06 Oct 2024
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
Niki Nezakati
Md Kaykobad Reza
Mashhour Solh
Mashhour Solh
M. Salman Asif
406
5
0
03 Oct 2024
Multi-modal Cross-domain Self-supervised Pre-training for fMRI and EEG
  Fusion
Multi-modal Cross-domain Self-supervised Pre-training for fMRI and EEG FusionNeural Networks (NN), 2024
Xinxu Wei
K. Zhao
Yong Jiao
Hua Xie
Hua Xie
Gregory A. Fonzo
Yu Zhang
183
8
0
27 Sep 2024
CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting
CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting
Haobo Li
Zhaowei Wang
Jiachen Wang
Yuanbo Wang
Alexis Kai Hon Lau
Huamin Qu
125
0
0
27 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in
  Cold-Start and Missing Modality Scenarios
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality ScenariosACM Conference on Recommender Systems (RecSys), 2024
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
238
18
0
26 Sep 2024
Multimodal Banking Dataset: Understanding Client Needs through Event Sequences
Multimodal Banking Dataset: Understanding Client Needs through Event Sequences
Mollaev Dzhambulat
Alexander Kostin
Postnova Maria
Ivan Karpukhin
Ivan A Kireev
Gleb Gusev
Ivan A Kireev
AI4TS
292
6
0
26 Sep 2024
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual
  Captioning of Human Movement Trajectories
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement TrajectoriesInternational Conference on Natural Language Generation (INLG), 2024
Hikaru Asano
Ryo Yonetani
Taiki Sekii
Hiroki Ouchi
275
1
0
19 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple
  Operators for Forecasting Fluid Dynamics
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
244
20
0
15 Sep 2024
Integration of Mamba and Transformer -- MAT for Long-Short Range Time
  Series Forecasting with Application to Weather Dynamics
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics
Wenqing Zhang
Junming Huang
Ruotong Wang
Changsong Wei
Wenqian Huang
Yuxin Qiao
Mamba
269
20
0
13 Sep 2024
What to align in multimodal contrastive learning?
What to align in multimodal contrastive learning?International Conference on Learning Representations (ICLR), 2024
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
341
28
0
11 Sep 2024
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression
  of Temporal and Spatial Redundancies in Point Cloud Transformers
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Luoyu Mei
Shuai Wang
Yun Cheng
Ruofeng Liu
Zhimeng Yin
Wenchao Jiang
Shuai Wang
Wei Gong
225
10
0
02 Sep 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language DescriptionACM Multimedia (MM), 2024
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
238
29
0
24 Aug 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A
  Single-Branch Approach
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach
Muhammad Saad Saeed
Shah Nawaz
Muhammad Zaigham Zaheer
Muhammad Haris Khan
Karthik Nandakumar
Muhammad Haroon Yousaf
Hassan Sajjad
Tom De Schepper
Markus Schedl
296
3
0
14 Aug 2024
Enhancing Visual Question Answering through Ranking-Based Hybrid
  Training and Multimodal Fusion
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
Peiyuan Chen
Zecheng Zhang
Yiping Dong
Li Zhou
Han Wang
261
16
0
14 Aug 2024
Swarm-Net: Firmware Attestation in IoT Swarms using Graph Neural
  Networks and Volatile Memory
Swarm-Net: Firmware Attestation in IoT Swarms using Graph Neural Networks and Volatile MemoryIEEE Internet of Things Journal (IEEE IoT J.), 2024
Varun Kohli
Bhavya Kohli
M. Aman
Biplab Sikdar
117
1
0
11 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
225
6
0
08 Aug 2024
MoExtend: Tuning New Experts for Modality and Task Extension
MoExtend: Tuning New Experts for Modality and Task ExtensionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Shanshan Zhong
Shanghua Gao
Zhongzhan Huang
Wushao Wen
Marinka Zitnik
Pan Zhou
VLMMLLMMoE
271
11
0
07 Aug 2024
A Systematic Review of Intermediate Fusion in Multimodal Deep Learning
  for Biomedical Applications
A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical ApplicationsImage and Vision Computing (IVC), 2024
V. Guarrasi
Fatih Aksu
Camillo Maria Caruso
Francesco Di Feola
Aurora Rofena
Filippo Ruffini
Paolo Soda
OffRLMedImAI4CE
188
51
0
02 Aug 2024
HyperMM : Robust Multimodal Learning with Varying-sized Inputs
HyperMM : Robust Multimodal Learning with Varying-sized Inputs
Hava Chaptoukaev
Vincenzo Marcianó
Francesco Galati
Maria A. Zuluaga
199
1
0
30 Jul 2024
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
Aashish Rai
Srinath Sridhar
DiffM
191
5
0
30 Jul 2024
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion
  Models
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models
Jing Yang
Runping Xi
Yingxin Lai
Xun Lin
Zitong Yu
DiffM
207
3
0
29 Jul 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons
  of Vision Language Models
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
Xinyu Pi
Mingyuan Wu
Jize Jiang
Haozhen Zheng
Beitong Tian
Chengxiang Zhai
Klara Nahrstedt
Zhiting Hu
VLM
205
1
0
25 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
319
1
0
23 Jul 2024
Resource-Efficient Federated Multimodal Learning via Layer-wise and
  Progressive Training
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training
Ye Lin Tun
Chu Myaet Thwal
Minh N. H. Nguyen
Choong Seon Hong
283
4
0
22 Jul 2024
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen
Haibin Wang
Yilun Huang
Ce Ge
Yaliang Li
Bolin Ding
Jingren Zhou
VLMSyDa
276
1
0
16 Jul 2024
Diagnosing and Re-learning for Balanced Multimodal Learning
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei
Siwei Li
Ruoxuan Feng
Di Hu
215
34
0
12 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
347
12
0
11 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
319
11
0
07 Jul 2024
Multimodal Classification via Modal-Aware Interactive Enhancement
Multimodal Classification via Modal-Aware Interactive Enhancement
Qing-Yuan Jiang
Zhouyang Chi
Yang Yang
227
3
0
05 Jul 2024
Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
Zixing Li
Chao Yan
Zhen Lan
Xiaojia Xiang
Han Zhou
Jun Lai
Dengqing Tang
288
2
0
02 Jul 2024
Assistive Image Annotation Systems with Deep Learning and Natural
  Language Capabilities: A Review
Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review
Moseli Motsóehli
VLM3DV
277
5
0
28 Jun 2024
Multimodal Prototyping for cancer survival prediction
Multimodal Prototyping for cancer survival prediction
Andrew H. Song
Richard J. Chen
Guillaume Jaume
Anurag J. Vaidya
Alexander S. Baras
Faisal Mahmood
344
39
0
28 Jun 2024
Structured Unrestricted-Rank Matrices for Parameter Efficient
  Fine-tuning
Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning
Arijit Sehanobish
Avinava Dubey
Krzysztof Choromanski
Somnath Basu Roy Chowdhury
Deepali Jain
Vikas Sindhwani
Snigdha Chaturvedi
ALM
274
7
0
25 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
446
10
0
24 Jun 2024
In-Context In-Context Learning with Transformer Neural Processes
In-Context In-Context Learning with Transformer Neural ProcessesSymposium on Advances in Approximate Bayesian Inference (AABI), 2024
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
Richard E. Turner
230
4
0
19 Jun 2024
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory
  Utilization for Hybrid CPU-GPU Offloaded Optimizers
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
187
7
0
15 Jun 2024
Improving Large Models with Small models: Lower Costs and Better
  Performance
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
206
12
0
15 Jun 2024
MoME: Mixture of Multimodal Experts for Cancer Survival Prediction
MoME: Mixture of Multimodal Experts for Cancer Survival PredictionInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Conghao Xiong
Hao Chen
Hao Zheng
Dong Wei
Yefeng Zheng
Joseph J. Y. Sung
Irwin King
MoE
215
26
0
14 Jun 2024
Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting
  Process: Methodology and Benchmark
Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark
Gaochang Wu
Yapeng Zhang
Lan Deng
Jingxin Zhang
Tianyou Chai
209
1
0
13 Jun 2024
Previous
1234567
Next