Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.06488
Cited By
Multimodal Learning with Transformers: A Survey
13 June 2022
P. Xu
Xiatian Zhu
David A. Clifton
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Learning with Transformers: A Survey"
50 / 268 papers shown
Title
Early Classifying Multimodal Sequences
Alexander Cao
J. Utke
Diego Klabjan
18
0
0
02 May 2023
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer
Yifang Xu
Yunzhuo Sun
Yang Li
Yilei Shi
Xiaoxia Zhu
S. Du
ViT
35
33
0
29 Apr 2023
A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions
Mohammad Fraiwan
Natheer Khasawneh
38
35
0
29 Apr 2023
Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers
Johannes Czech
Jannis Blüml
Kristian Kersting
ViT
50
0
0
28 Apr 2023
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams
M. Tavakoli
Rohitash Chandra
Fengrui Tian
Cristián Bravo
19
8
0
21 Apr 2023
Transformer-Based Visual Segmentation: A Survey
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
26
112
0
19 Apr 2023
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Paul Pu Liang
Faisal Mahmood
25
22
0
13 Apr 2023
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments
Dongyan An
H. Wang
Wenguan Wang
Zun Wang
Yan Huang
Keji He
Liang Wang
50
61
0
06 Apr 2023
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department
Sabri Boughorbel
Fethi Jarray
Abdulaziz Yousuf Al-Homaid
Rashid Niaz
Khalid Alyafei
19
0
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
34
451
0
03 Apr 2023
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Lu Huo
Jiahao Xia
Leijie Zhang
Haimin Zhang
Min Xu
12
2
0
02 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
13
23
0
31 Mar 2023
What Can Human Sketches Do for Object Detection?
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
ObjD
26
31
0
27 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
CLL
19
11
0
25 Mar 2023
Building artificial neural circuits for domain-general cognition: a primer on brain-inspired systems-level architecture
Jascha Achterberg
Danyal Akarca
Moataz Assem
Moritz P. Heimbach
D. Astle
John Duncan
AI4CE
28
4
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
72
152
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
33
46
0
21 Mar 2023
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review
Asim Waqas
Aakash Tripathi
Ravichandran Ramachandran
Paul Stewart
Ghulam Rasool
AI4CE
32
29
0
11 Mar 2023
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework
Paul Pu Liang
Yun Cheng
Xiang Fan
Chun Kai Ling
Suzanne Nie
...
Nicholas B. Allen
Randy P. Auerbach
Faisal Mahmood
Ruslan Salakhutdinov
Louis-Philippe Morency
27
29
0
23 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
24
195
0
20 Feb 2023
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
60
0
0
18 Feb 2023
PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix Embedding
Zhangyang Gao
Yuqi Hu
Cheng Tan
Stan Z. Li
15
13
0
14 Feb 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
Ryumei Nakada
Halil Ibrahim Gulluk
Zhun Deng
Wenlong Ji
James Y. Zou
Linjun Zhang
SSL
VLM
28
25
0
13 Feb 2023
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Ziyu Wan
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
J. Wang
AI4CE
8
10
0
24 Dec 2022
Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark
Jianwu Fang
Lei-lei Li
Kuan Yang
Zhedong Zheng
Jianru Xue
Tat-Seng Chua
15
12
0
19 Dec 2022
Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics
Manuela Brenner
Florian Hess
G. Koppe
Daniel Durstewitz
10
9
0
15 Dec 2022
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal
K. Liang
Lingyuan Meng
Meng Liu
Yue Liu
Wenxuan Tu
Siwei Wang
Sihang Zhou
Xinwang Liu
Fu Sun
LRM
21
107
0
12 Dec 2022
Multimodal Learning for Multi-Omics: A Survey
Sina Tabakhi
M. N. I. Suvon
Pegah Ahadian
Haiping Lu
15
9
0
29 Nov 2022
An Inclusive Notion of Text
Ilia Kuznetsov
Iryna Gurevych
14
0
0
10 Nov 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
23
16
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
13
59
0
07 Sep 2022
Multimodal learning with graphs
Yasha Ektefaie
George Dasoulas
Ayush Noori
Maha Farhat
Marinka Zitnik
35
82
0
07 Sep 2022
CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation
Jianwei Lin
Jiatai Lin
Chenghao Lu
Hao Chen
Huan Lin
...
Biao Huang
C. Liang
Guoqiang Han
Zaiyi Liu
Chu Han
MedIm
14
62
0
15 Jul 2022
Transformers in 3D Point Clouds: A Survey
Dening Lu
Qian Xie
Mingqiang Wei
Kyle Gao
Linlin Xu
Jonathan Li
3DPC
ViT
30
47
0
16 May 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
30
29
0
25 Apr 2022
Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review
C. Cui
Haichun Yang
Yaohong Wang
Shilin Zhao
Zuhayr Asad
Lori A. Coburn
K. Wilson
Bennett A. Landman
Yuankai Huo
12
93
0
25 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
F. Khan
H. Fu
ViT
LM&MA
MedIm
103
653
0
24 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
222
0
20 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
58
325
0
11 Nov 2021
From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation
Dhruv Agarwal
Tanay Agrawal
Laura M. Ferrari
Franccois Bremond
19
5
0
15 Oct 2021
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain
Myriam Tami
M. Batteux
C´eline Hudelot
AI4TS
20
2
0
15 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
194
218
0
24 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
39
28
0
22 Sep 2021
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
59
44
0
21 Sep 2021
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
169
462
0
12 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
44
36
0
06 Aug 2021
Previous
1
2
3
4
5
6
Next