ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey
v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
    ViT
ArXiv (abs)PDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown
Fusing Echocardiography Images and Medical Records for Continuous Patient Stratification
Fusing Echocardiography Images and Medical Records for Continuous Patient StratificationIEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2024
Nathan Painchaud
Jérémie Stym-Popper
P. Courand
Nicolas Thome
Pierre-Marc Jodoin
Nicolas Duchateau
Olivier Bernard
234
4
0
15 Jan 2024
Transformer for Object Re-Identification: A Survey
Transformer for Object Re-Identification: A SurveyInternational Journal of Computer Vision (IJCV), 2024
Mang Ye
Shuo Chen
Chenyue Li
Wei-Shi Zheng
David J. Crandall
Bo Du
ViT
423
47
0
13 Jan 2024
A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for
  Enhancing RSVP-BCI Decoding
A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI DecodingNeural Networks (NN), 2024
Xujin Li
Wei Wei
Shuang Qiu
Huiguang He
207
5
0
12 Jan 2024
Complementary Information Mutual Learning for Multimodality Medical
  Image Segmentation
Complementary Information Mutual Learning for Multimodality Medical Image Segmentation
Chuyun Shen
Wenhao Li
Haoqing Chen
Xiaoling Wang
Fengping Zhu
Yuxin Li
Xiangfeng Wang
Bo Jin
261
4
0
05 Jan 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and
  Highlight Detection
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionAAAI Conference on Artificial Intelligence (AAAI), 2024
Hao Sun
Mingyao Zhou
Wenjing Chen
Wei Xie
PINN3DGSViT
266
69
0
04 Jan 2024
Inter-X: Towards Versatile Human-Human Interaction Analysis
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu
Xintao Lv
Manwen Liao
Xin Jin
Shuwen Wu
...
Fengyun Rao
Xingdong Sheng
Yunhui Liu
Wenjun Zeng
Yunbo Wang
312
76
0
26 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
418
74
0
18 Dec 2023
Can Physician Judgment Enhance Model Trustworthiness? A Case Study on
  Predicting Pathological Lymph Nodes in Rectal Cancer
Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer
Kazuma Kobayashi
Yasuyuki Takamizawa
M. Miyake
Sono Ito
Lin Gu
Tatsuya Nakatsuka
Yu Akagi
Tatsuya Harada
Y. Kanemitsu
Ryuji Hamamoto
190
3
0
15 Dec 2023
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
L. Nguyen
Praneeth Susarla
Anirban Mukherjee
Manuel Lage Cañellas
Constantino Álvarez Casado
Xiaoting Wu
Olli Silvén
D. Jayagopi
Miguel Bordallo López
209
7
0
11 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
316
173
0
30 Nov 2023
Large Model Based Referring Camouflaged Object Detection
Large Model Based Referring Camouflaged Object Detection
Shupeng Cheng
Ge-Peng Ji
Pengda Qin
Deng-Ping Fan
Bowen Zhou
Peng Xu
ObjD
269
13
0
28 Nov 2023
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for
  Vision-Language Tracking
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking
Jiawei Ge
Xiangmei Chen
Jiuxin Cao
Xueling Zhu
Bo Liu
VLM
374
11
0
28 Nov 2023
Images Connect Us Together: Navigating a COVID-19 Local Outbreak in
  China Through Social Media Images
Images Connect Us Together: Navigating a COVID-19 Local Outbreak in China Through Social Media Images
Changyang He
Lu He
Wenjie Yang
Yue Liu
153
6
0
18 Nov 2023
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based
  Inference
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference
Marvin Schmitt
Stefan T. Radev
Paul-Christian Bürkner
378
6
0
17 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
320
17
0
14 Nov 2023
Which One? Leveraging Context Between Objects and Multiple Views for
  Language Grounding
Which One? Leveraging Context Between Objects and Multiple Views for Language GroundingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
204
2
0
12 Nov 2023
Conceptual Model Interpreter for Large Language Models
Conceptual Model Interpreter for Large Language ModelsInternational Conference on Conceptual Modeling (ER), 2023
Felix Härer
157
11
0
11 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Siddharth Srivastava
Gaurav Sharma
SSL
288
83
0
07 Nov 2023
Dynamic Multimodal Information Bottleneck for Multimodality
  Classification
Dynamic Multimodal Information Bottleneck for Multimodality ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
D. C. Marshall
Shuang Wu
Sheng Zhang
Chao Huang
Tieyong Zeng
Xiaodan Xing
Simon Walsh
Guang Yang
374
14
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
232
84
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D ScenesBritish Machine Vision Conference (BMVC), 2023
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
194
6
0
30 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQAIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
302
14
0
25 Oct 2023
Density of States Prediction of Crystalline Materials via Prompt-guided
  Multi-Modal Transformer
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal TransformerNeural Information Processing Systems (NeurIPS), 2023
Namkyeong Lee
Heewoong Noh
Sungwon Kim
Dongmin Hyun
Gyoung S. Na
Chanyoung Park
297
9
0
24 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
461
228
0
16 Oct 2023
Can We Edit Multimodal Large Language Models?
Can We Edit Multimodal Large Language Models?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
597
40
0
12 Oct 2023
Robust Multimodal Learning with Missing Modalities via
  Parameter-Efficient Adaptation
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient AdaptationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Md Kaykobad Reza
Ashley Prater-Bennette
M. Salman Asif
308
24
0
06 Oct 2023
A Survey of GPT-3 Family Large Language Models Including ChatGPT and
  GPT-4
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4Natural Language Processing Journal (JNLP), 2023
Katikapalli Subramanyam Kalyan
LM&MAAI4CELRMAILawELM
294
343
0
04 Oct 2023
Modality-aware Transformer for Financial Time series Forecasting
Modality-aware Transformer for Financial Time series ForecastingInternational Conference on AI in Finance (ICAF), 2023
Hajar Emami
Xuan-Hong Dang
Yousaf Shah
Petros Zerfos
AI4TS
136
11
0
02 Oct 2023
Building Flexible, Scalable, and Machine Learning-ready Multimodal
  Oncology Datasets
Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology DatasetsItalian National Conference on Sensors (INS), 2023
Aakash Tripathi
Asim Waqas
Kavya Venkatesan
Yasin Yilmaz
Ghulam Rasool
AI4CE
268
27
0
30 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
211
22
0
28 Sep 2023
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical
  Flow and Scene Flow Estimation
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow EstimationIEEE International Conference on Computer Vision (ICCV), 2023
Zhexiong Wan
Yuxin Mao
Jing Zhang
Yuchao Dai
3DPC
258
30
0
26 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
320
22
0
23 Sep 2023
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene
  Parsing
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene ParsingIEEE Transactions on Intelligent Vehicles (TIV), 2023
Jiahang Li
Yikang Zhang
Peng Yun
Guangliang Zhou
Qijun Chen
Rui Fan
ViTOffRL
395
41
0
19 Sep 2023
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts
  by Multimodal Learning with Graph Neural Network and Language Model
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Phan The Duy
Nghi Hoang Khoa
N. H. Quyen
Le Cong Trinh
V. Kiên
Trinh Minh Hoang
V. Pham
144
25
0
15 Sep 2023
Deep evidential fusion with uncertainty quantification and contextual
  discounting for multimodal medical image segmentation
Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation
Ling Huang
S. Ruan
P. Decazes
Thierry Denoeux
EDLMedIm
227
1
0
12 Sep 2023
A Survey on Interpretable Cross-modal Reasoning
A Survey on Interpretable Cross-modal Reasoning
Dizhan Xue
Shengsheng Qian
Zuyi Zhou
Changsheng Xu
LRM
400
5
0
05 Sep 2023
Learning multi-modal generative models with permutation-invariant
  encoders and tighter variational bounds
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
358
0
0
01 Sep 2023
Multitask Deep Learning for Accurate Risk Stratification and Prediction
  of Next Steps for Coronary CT Angiography Patients
Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients
Juan Lu
Bennamoun
J. Stewart
J. Eshraghian
Yanbin Liu
B. Chow
Frank M. Sanfilippo
Girish Dwivedi
OOD
160
2
0
01 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
285
5
0
28 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision TransformersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
609
8
0
18 Aug 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible
  Momentum Contrast and Topology Preservation
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology PreservationIEEE International Conference on Computer Vision (ICCV), 2023
Hongguang Zhu
Yunchao Wei
Xiaodan Liang
Chunjie Zhang
Yao-Min Zhao
VLM
133
36
0
14 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
430
152
0
25 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya Zhang
VOS
268
37
0
25 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
283
33
0
24 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
  and Comprehensive Framework
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
367
17
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
333
45
0
21 Jul 2023
Transformers in Reinforcement Learning: A Survey
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
249
26
0
12 Jul 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedImAI4CE
260
81
0
30 Jun 2023
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose ModelingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Ying Tai
Wenhao Chai
Zhongyu Jiang
Tianbo Ye
Xiuming Zhang
Lei Li
Gaoang Wang
3DH
165
6
0
29 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjDVLM
406
218
0
28 Jun 2023
Previous
1234567
Next