Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2206.06488
Cited By
v1
v2 (latest)
Multimodal Learning with Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multimodal Learning with Transformers: A Survey"
50 / 305 papers shown
Fusing Echocardiography Images and Medical Records for Continuous Patient Stratification
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2024
Nathan Painchaud
Jérémie Stym-Popper
P. Courand
Nicolas Thome
Pierre-Marc Jodoin
Nicolas Duchateau
Olivier Bernard
234
4
0
15 Jan 2024
Transformer for Object Re-Identification: A Survey
International Journal of Computer Vision (IJCV), 2024
Mang Ye
Shuo Chen
Chenyue Li
Wei-Shi Zheng
David J. Crandall
Bo Du
ViT
423
47
0
13 Jan 2024
A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding
Neural Networks (NN), 2024
Xujin Li
Wei Wei
Shuang Qiu
Huiguang He
207
5
0
12 Jan 2024
Complementary Information Mutual Learning for Multimodality Medical Image Segmentation
Chuyun Shen
Wenhao Li
Haoqing Chen
Xiaoling Wang
Fengping Zhu
Yuxin Li
Xiangfeng Wang
Bo Jin
261
4
0
05 Jan 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
AAAI Conference on Artificial Intelligence (AAAI), 2024
Hao Sun
Mingyao Zhou
Wenjing Chen
Wei Xie
PINN
3DGS
ViT
266
69
0
04 Jan 2024
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu
Xintao Lv
Manwen Liao
Xin Jin
Shuwen Wu
...
Fengyun Rao
Xingdong Sheng
Yunhui Liu
Wenjun Zeng
Yunbo Wang
312
76
0
26 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
418
74
0
18 Dec 2023
Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer
Kazuma Kobayashi
Yasuyuki Takamizawa
M. Miyake
Sono Ito
Lin Gu
Tatsuya Nakatsuka
Yu Akagi
Tatsuya Harada
Y. Kanemitsu
Ryuji Hamamoto
190
3
0
15 Dec 2023
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
L. Nguyen
Praneeth Susarla
Anirban Mukherjee
Manuel Lage Cañellas
Constantino Álvarez Casado
Xiaoting Wu
Olli Silvén
D. Jayagopi
Miguel Bordallo López
209
7
0
11 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Computer Vision and Pattern Recognition (CVPR), 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
316
173
0
30 Nov 2023
Large Model Based Referring Camouflaged Object Detection
Shupeng Cheng
Ge-Peng Ji
Pengda Qin
Deng-Ping Fan
Bowen Zhou
Peng Xu
ObjD
269
13
0
28 Nov 2023
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking
Jiawei Ge
Xiangmei Chen
Jiuxin Cao
Xueling Zhu
Bo Liu
VLM
374
11
0
28 Nov 2023
Images Connect Us Together: Navigating a COVID-19 Local Outbreak in China Through Social Media Images
Changyang He
Lu He
Wenjie Yang
Yue Liu
153
6
0
18 Nov 2023
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference
Marvin Schmitt
Stefan T. Radev
Paul-Christian Bürkner
378
6
0
17 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
320
17
0
14 Nov 2023
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
204
2
0
12 Nov 2023
Conceptual Model Interpreter for Large Language Models
International Conference on Conceptual Modeling (ER), 2023
Felix Härer
157
11
0
11 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Siddharth Srivastava
Gaurav Sharma
SSL
288
83
0
07 Nov 2023
Dynamic Multimodal Information Bottleneck for Multimodality Classification
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
D. C. Marshall
Shuang Wu
Sheng Zhang
Chao Huang
Tieyong Zeng
Xiaodan Xing
Simon Walsh
Guang Yang
374
14
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
232
84
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
British Machine Vision Conference (BMVC), 2023
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
194
6
0
30 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
302
14
0
25 Oct 2023
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
Neural Information Processing Systems (NeurIPS), 2023
Namkyeong Lee
Heewoong Noh
Sungwon Kim
Dongmin Hyun
Gyoung S. Na
Chanyoung Park
297
9
0
24 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
461
228
0
16 Oct 2023
Can We Edit Multimodal Large Language Models?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
597
40
0
12 Oct 2023
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Md Kaykobad Reza
Ashley Prater-Bennette
M. Salman Asif
308
24
0
06 Oct 2023
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Natural Language Processing Journal (JNLP), 2023
Katikapalli Subramanyam Kalyan
LM&MA
AI4CE
LRM
AILaw
ELM
294
343
0
04 Oct 2023
Modality-aware Transformer for Financial Time series Forecasting
International Conference on AI in Finance (ICAF), 2023
Hajar Emami
Xuan-Hong Dang
Yousaf Shah
Petros Zerfos
AI4TS
136
11
0
02 Oct 2023
Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets
Italian National Conference on Sensors (INS), 2023
Aakash Tripathi
Asim Waqas
Kavya Venkatesan
Yasin Yilmaz
Ghulam Rasool
AI4CE
268
27
0
30 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
211
22
0
28 Sep 2023
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation
IEEE International Conference on Computer Vision (ICCV), 2023
Zhexiong Wan
Yuxin Mao
Jing Zhang
Yuchao Dai
3DPC
258
30
0
26 Sep 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
320
22
0
23 Sep 2023
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
IEEE Transactions on Intelligent Vehicles (TIV), 2023
Jiahang Li
Yikang Zhang
Peng Yun
Guangliang Zhou
Qijun Chen
Rui Fan
ViT
OffRL
395
41
0
19 Sep 2023
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Phan The Duy
Nghi Hoang Khoa
N. H. Quyen
Le Cong Trinh
V. Kiên
Trinh Minh Hoang
V. Pham
144
25
0
15 Sep 2023
Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation
Ling Huang
S. Ruan
P. Decazes
Thierry Denoeux
EDL
MedIm
227
1
0
12 Sep 2023
A Survey on Interpretable Cross-modal Reasoning
Dizhan Xue
Shengsheng Qian
Zuyi Zhou
Changsheng Xu
LRM
400
5
0
05 Sep 2023
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
358
0
0
01 Sep 2023
Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients
Juan Lu
Bennamoun
J. Stewart
J. Eshraghian
Yanbin Liu
B. Chow
Frank M. Sanfilippo
Girish Dwivedi
OOD
160
2
0
01 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
285
5
0
28 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
609
8
0
18 Aug 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
IEEE International Conference on Computer Vision (ICCV), 2023
Hongguang Zhu
Yunchao Wei
Xiaodan Liang
Chunjie Zhang
Yao-Min Zhao
VLM
133
36
0
14 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
430
152
0
25 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya Zhang
VOS
268
37
0
25 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
IEEE International Conference on Computer Vision (ICCV), 2023
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
283
33
0
24 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
367
17
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
333
45
0
21 Jul 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
249
26
0
12 Jul 2023
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
260
81
0
30 Jun 2023
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Ying Tai
Wenhao Chai
Zhongyu Jiang
Tianbo Ye
Xiuming Zhang
Lei Li
Gaoang Wang
3DH
165
6
0
29 Jun 2023
Towards Open Vocabulary Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
406
218
0
28 Jun 2023
Previous
1
2
3
4
5
6
7
Next