ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey
v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
    ViT
ArXiv (abs)PDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown
Generate to Understand for Representation
Generate to Understand for Representation
Changshan Xue
Xiande Zhong
Xiaoqing Liu
VLM
296
0
0
14 Jun 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to
  CLIP Training
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
394
8
0
13 Jun 2023
Modality Influence in Multimodal Machine Learning
Modality Influence in Multimodal Machine Learning
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
227
3
0
10 Jun 2023
Towards Arabic Multimodal Dataset for Sentiment Analysis
Towards Arabic Multimodal Dataset for Sentiment AnalysisInternational Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2023
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
66
9
0
10 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through NarrationsIEEE International Conference on Computer Vision (ICCV), 2023
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
217
27
0
06 Jun 2023
Backchannel Detection and Agreement Estimation from Video with
  Transformer Networks
Backchannel Detection and Agreement Estimation from Video with Transformer NetworksIEEE International Joint Conference on Neural Network (IJCNN), 2023
A. Amer
Chirag Bhuvaneshwara
G. Addluri
Mohammed Maqsood Shaik
Vedant Bonde
Philippe Muller
224
9
0
02 Jun 2023
Transformer-based Multi-Modal Learning for Multi Label Remote Sensing
  Image Classification
Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image ClassificationIEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023
David Hoffmann
Kai Norman Clasen
Begüm Demir
109
13
0
02 Jun 2023
Evaluating the Capabilities of Multi-modal Reasoning Models with
  Synthetic Task Data
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data
Nathan Vaska
Victoria Helus
LRM
103
1
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual PromptingIEEE International Joint Conference on Neural Network (IJCNN), 2023
Shubin Huang
Qiong Wu
Weihao Ye
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLMVPVLMLRM
123
2
0
01 Jun 2023
Large language models improve Alzheimer's disease diagnosis using
  multi-modality data
Large language models improve Alzheimer's disease diagnosis using multi-modality data
Yingjie Feng
Jun Wang
Xianfeng Gu
Xiaoyin Xu
Hao Fei
LM&MA
164
23
0
26 May 2023
GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for
  Remote Sensing Data
GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data
Zhitong Xiong
Sining Chen
Yi Wang
Lichao Mou
Xiao Xiang Zhu
147
9
0
24 May 2023
PanoContext-Former: Panoramic Total Scene Understanding with a
  Transformer
PanoContext-Former: Panoramic Total Scene Understanding with a TransformerComputer Vision and Pattern Recognition (CVPR), 2023
Yuan Dong
C. Fang
Liefeng Bo
Zilong Dong
Ping Tan
MDEViT
248
23
0
21 May 2023
Efficient Multimodal Neural Networks for Trigger-less Voice Assistants
Efficient Multimodal Neural Networks for Trigger-less Voice AssistantsInterspeech (Interspeech), 2023
Sai Srujana Buddi
U. Sarawgi
Tashweena Heeramun
Karan Sawnhey
Ed Yanosik
Saravana Rathinam
Saurabh N. Adya
177
5
0
20 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOSViT
159
7
0
12 May 2023
Multimodal Understanding Through Correlation Maximization and
  Minimization
Multimodal Understanding Through Correlation Maximization and Minimization
Yi Shi
Marc Niethammer
190
1
0
04 May 2023
Early Classifying Multimodal Sequences
Early Classifying Multimodal SequencesInternational Conference on Multimodal Interaction (ICMI), 2023
Alexander Cao
J. Utke
Diego Klabjan
134
0
0
02 May 2023
MH-DETR: Video Moment and Highlight Detection with Cross-modal
  Transformer
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerIEEE International Joint Conference on Neural Network (IJCNN), 2023
Yifang Xu
Yunzhuo Sun
Yang Li
Yilei Shi
Xiaoxia Zhu
S. Du
ViT
253
48
0
29 Apr 2023
A Review of ChatGPT Applications in Education, Marketing, Software
  Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions
A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions
Mohammad Fraiwan
Natheer Khasawneh
219
57
0
29 Apr 2023
Representation Matters: The Game of Chess Poses a Challenge to Vision
  Transformers
Representation Matters: The Game of Chess Poses a Challenge to Vision TransformersEuropean Conference on Artificial Intelligence (ECAI), 2023
Johannes Czech
Johannes Czech
Kristian Kersting
ViT
153
0
0
28 Apr 2023
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and
  Numerical Data Streams
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data StreamsApplied Soft Computing (Appl. Soft Comput.), 2023
M. Tavakoli
Rohitash Chandra
Fengrui Tian
Cristián Bravo
204
26
0
21 Apr 2023
Transformer-Based Visual Segmentation: A Survey
Transformer-Based Visual Segmentation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViTMedIm
370
244
0
19 Apr 2023
Modeling Dense Multimodal Interactions Between Biological Pathways and
  Histology for Survival Prediction
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival PredictionComputer Vision and Pattern Recognition (CVPR), 2023
Guillaume Jaume
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Paul Pu Liang
Faisal Mahmood
412
102
0
13 Apr 2023
ETPNav: Evolving Topological Planning for Vision-Language Navigation in
  Continuous Environments
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous EnvironmentsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Dongyan An
Hongru Wang
Wenguan Wang
Zun Wang
Yan Huang
Keji He
Liang Wang
479
140
0
06 Apr 2023
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency
  Department
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department
Sabri Boughorbel
Fethi Jarray
Abdulaziz Yousuf Al-Homaid
Rashid Niaz
Khalid Alyafei
156
1
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
495
1,005
0
03 Apr 2023
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Lu Huo
Jiahao Xia
Leijie Zhang
Haimin Zhang
Min Xu
229
2
0
02 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
319
89
0
31 Mar 2023
What Can Human Sketches Do for Object Detection?
What Can Human Sketches Do for Object Detection?Computer Vision and Pattern Recognition (CVPR), 2023
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
ObjD
311
41
0
27 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of
  Vision-and-Language Tasks Using Knowledge Distillation
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLMCLL
191
11
0
25 Mar 2023
Building artificial neural circuits for domain-general cognition: a
  primer on brain-inspired systems-level architecture
Building artificial neural circuits for domain-general cognition: a primer on brain-inspired systems-level architecture
Jascha Achterberg
Danyal Akarca
Moataz Assem
Moritz P. Heimbach
D. Astle
John Duncan
AI4CE
128
5
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
303
199
0
21 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
448
68
0
21 Mar 2023
Multimodal Data Integration for Oncology in the Era of Deep Neural
  Networks: A Review
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review
Asim Waqas
Aakash Tripathi
Ravichandran Ramachandran
Paul Stewart
Ghulam Rasool
AI4CE
477
80
0
11 Mar 2023
Quantifying & Modeling Multimodal Interactions: An Information
  Decomposition Framework
Quantifying & Modeling Multimodal Interactions: An Information Decomposition FrameworkNeural Information Processing Systems (NeurIPS), 2023
Paul Pu Liang
Yun Cheng
Xiang Fan
Chun Kai Ling
Suzanne Nie
...
Nicholas B. Allen
Randy P. Auerbach
Faisal Mahmood
Ruslan Salakhutdinov
Louis-Philippe Morency
408
61
0
23 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive SurveyMachine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CEVLM
460
272
0
20 Feb 2023
Transformadores: Fundamentos teoricos y Aplicaciones
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
292
0
0
18 Feb 2023
PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix
  Embedding
PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix Embedding
Zhangyang Gao
Yuqi Hu
Cheng Tan
Stan Z. Li
272
17
0
14 Feb 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired
  Data
Understanding Multimodal Contrastive Learning and Incorporating Unpaired DataInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Ryumei Nakada
Halil Ibrahim Gulluk
Zhun Deng
Wenlong Ji
James Zou
Linjun Zhang
SSLVLM
381
49
0
13 Feb 2023
On Realization of Intelligent Decision-Making in the Real World: A
  Foundation Decision Model Perspective
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
211
12
0
24 Dec 2022
Cognitive Accident Prediction in Driving Scenes: A Multimodality
  Benchmark
Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark
Jianwu Fang
Lei-lei Li
Kuan Yang
Zhedong Zheng
Jianru Xue
Tat-Seng Chua
382
19
0
19 Dec 2022
Integrating Multimodal Data for Joint Generative Modeling of Complex
  Dynamics
Integrating Multimodal Data for Joint Generative Modeling of Complex DynamicsInternational Conference on Machine Learning (ICML), 2022
Manuela Brenner
Florian Hess
G. Koppe
Daniel Durstewitz
481
14
0
15 Dec 2022
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic,
  and Multimodal
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and MultimodalIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
K. Liang
Lingyuan Meng
Meng Liu
Yue Liu
Wenxuan Tu
Siwei Wang
Sihang Zhou
Xinwang Liu
Fu Sun
LRM
456
225
0
12 Dec 2022
Multimodal Learning for Multi-Omics: A Survey
Multimodal Learning for Multi-Omics: A Survey
Sina Tabakhi
M. N. I. Suvon
Pegah Ahadian
Haiping Lu
236
15
0
29 Nov 2022
An Inclusive Notion of Text
An Inclusive Notion of TextAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ilia Kuznetsov
Iryna Gurevych
161
0
0
10 Nov 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
354
38
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
163
0
07 Sep 2022
Multimodal learning with graphs
Multimodal learning with graphsNature Machine Intelligence (Nat. Mach. Intell.), 2022
Yasha Ektefaie
George Dasoulas
Ayush Noori
Maha Farhat
Marinka Zitnik
578
137
0
07 Sep 2022
CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with
  Modality-Correlated Cross-Attention for Brain Tumor Segmentation
CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor SegmentationIEEE Transactions on Medical Imaging (IEEE TMI), 2022
Jianwei Lin
Jiatai Lin
Chenghao Lu
Hao Chen
Huan Lin
...
Biao Huang
C. Liang
Guoqiang Han
Zaiyi Liu
Chu Han
MedIm
194
126
0
15 Jul 2022
Transformers in 3D Point Clouds: A Survey
Transformers in 3D Point Clouds: A Survey
Dening Lu
Qian Xie
Mingqiang Wei
Kyle Gao
Linlin Xu
Jonathan Li
3DPCViT
316
65
0
16 May 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo
  and Text
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and TextComputer Vision and Pattern Recognition (CVPR), 2022
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
397
37
0
25 Apr 2022
Previous
1234567
Next