ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey

Multimodal Learning with Transformers: A Survey

13 June 2022
P. Xu
Xiatian Zhu
David A. Clifton
    ViT
ArXivPDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 268 papers shown
Title
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
28
12
0
14 Nov 2023
Which One? Leveraging Context Between Objects and Multiple Views for
  Language Grounding
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
8
1
0
12 Nov 2023
Conceptual Model Interpreter for Large Language Models
Conceptual Model Interpreter for Large Language Models
Felix Härer
13
7
0
11 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
Dynamic Multimodal Information Bottleneck for Multimodality
  Classification
Dynamic Multimodal Information Bottleneck for Multimodality Classification
Yingying Fang
Shuang Wu
Sheng Zhang
Chao Huang
Tieyong Zeng
Xiaodan Xing
Simon Walsh
Guang Yang
19
7
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
21
62
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
19
1
0
30 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
6
9
0
25 Oct 2023
Density of States Prediction of Crystalline Materials via Prompt-guided
  Multi-Modal Transformer
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
Namkyeong Lee
Heewoong Noh
Sungwon Kim
Dongmin Hyun
Gyoung S. Na
Chanyoung Park
11
2
0
24 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
138
139
0
16 Oct 2023
Can We Edit Multimodal Large Language Models?
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
28
28
0
12 Oct 2023
Robust Multimodal Learning with Missing Modalities via
  Parameter-Efficient Adaptation
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
Md Kaykobad Reza
Ashley Prater-Bennette
M. Salman Asif
16
4
0
06 Oct 2023
A Survey of GPT-3 Family Large Language Models Including ChatGPT and
  GPT-4
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Katikapalli Subramanyam Kalyan
LM&MA
AI4CE
LRM
AILaw
ELM
27
217
0
04 Oct 2023
Modality-aware Transformer for Financial Time series Forecasting
Modality-aware Transformer for Financial Time series Forecasting
Hajar Emami
Xuan-Hong Dang
Yousaf Shah
Petros Zerfos
AI4TS
19
0
0
02 Oct 2023
Building Flexible, Scalable, and Machine Learning-ready Multimodal
  Oncology Datasets
Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets
Aakash Tripathi
Asim Waqas
Kavya Venkatesan
Yasin Yilmaz
Ghulam Rasool
AI4CE
12
14
0
30 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
8
16
0
28 Sep 2023
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical
  Flow and Scene Flow Estimation
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation
Zhexiong Wan
Yuxin Mao
Jing Zhang
Yuchao Dai
3DPC
17
22
0
26 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene
  Parsing
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
Jiahang Li
Yikang Zhang
Peng Yun
Guangliang Zhou
Qijun Chen
Rui Fan
ViT
OffRL
11
26
0
19 Sep 2023
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts
  by Multimodal Learning with Graph Neural Network and Language Model
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Phan The Duy
Nghi Hoang Khoa
N. H. Quyen
Le Cong Trinh
V. Kiên
Trinh Minh Hoang
V. Pham
9
9
0
15 Sep 2023
Deep evidential fusion with uncertainty quantification and contextual
  discounting for multimodal medical image segmentation
Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation
Ling Huang
S. Ruan
P. Decazes
Thierry Denoeux
EDL
MedIm
17
1
0
12 Sep 2023
A Survey on Interpretable Cross-modal Reasoning
A Survey on Interpretable Cross-modal Reasoning
Dizhan Xue
Shengsheng Qian
Zuyi Zhou
Changsheng Xu
LRM
21
4
0
05 Sep 2023
Learning multi-modal generative models with permutation-invariant
  encoders and tighter variational bounds
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
8
0
0
01 Sep 2023
Multitask Deep Learning for Accurate Risk Stratification and Prediction
  of Next Steps for Coronary CT Angiography Patients
Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients
Juan Lu
Bennamoun
J. Stewart
J. Eshraghian
Yanbin Liu
B. Chow
Frank M. Sanfilippo
Girish Dwivedi
OOD
6
1
0
01 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
50
3
0
28 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
29
3
0
18 Aug 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible
  Momentum Contrast and Topology Preservation
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
Hongguang Zhu
Yunchao Wei
Xiaodan Liang
Chunjie Zhang
Yao-Min Zhao
VLM
19
26
0
14 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya-Qin Zhang
VOS
8
13
0
25 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
33
12
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
11
18
0
21 Jul 2023
Transformers in Reinforcement Learning: A Survey
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
6
18
0
12 Jul 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
19
24
0
30 Jun 2023
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
Zhenyu Zhang
Wenhao Chai
Zhongyu Jiang
Tianbo Ye
Mingli Song
Jenq-Neng Hwang
Gaoang Wang
3DH
11
4
0
29 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Bernard Ghanem
Dacheng Tao
ObjD
VLM
25
104
0
28 Jun 2023
Generate to Understand for Representation
Generate to Understand for Representation
Changshan Xue
Xiande Zhong
Xiaoqing Liu
VLM
27
0
0
14 Jun 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to
  CLIP Training
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
17
5
0
13 Jun 2023
Modality Influence in Multimodal Machine Learning
Modality Influence in Multimodal Machine Learning
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
16
1
0
10 Jun 2023
Towards Arabic Multimodal Dataset for Sentiment Analysis
Towards Arabic Multimodal Dataset for Sentiment Analysis
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
6
1
0
10 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
25
21
0
06 Jun 2023
Backchannel Detection and Agreement Estimation from Video with
  Transformer Networks
Backchannel Detection and Agreement Estimation from Video with Transformer Networks
A. Amer
Chirag Bhuvaneshwara
G. Addluri
Mohammed Maqsood Shaik
Vedant Bonde
Philippe Muller
17
5
0
02 Jun 2023
Transformer-based Multi-Modal Learning for Multi Label Remote Sensing
  Image Classification
Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification
David Hoffmann
Kai Norman Clasen
Begum Demir
9
8
0
02 Jun 2023
Evaluating the Capabilities of Multi-modal Reasoning Models with
  Synthetic Task Data
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data
Nathan Vaska
Victoria Helus
LRM
7
1
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
16
0
0
01 Jun 2023
Large language models improve Alzheimer's disease diagnosis using
  multi-modality data
Large language models improve Alzheimer's disease diagnosis using multi-modality data
Yingjie Feng
Jun Wang
Xianfeng Gu
Xiaoyin Xu
M. Zhang
LM&MA
8
10
0
26 May 2023
GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for
  Remote Sensing Data
GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data
Zhitong Xiong
Sining Chen
Yi Wang
Lichao Mou
Xiao Xiang Zhu
6
4
0
24 May 2023
PanoContext-Former: Panoramic Total Scene Understanding with a
  Transformer
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong
C. Fang
Liefeng Bo
Zilong Dong
Ping Tan
MDE
ViT
13
9
0
21 May 2023
Efficient Multimodal Neural Networks for Trigger-less Voice Assistants
Efficient Multimodal Neural Networks for Trigger-less Voice Assistants
Sai Srujana Buddi
U. Sarawgi
Tashweena Heeramun
Karan Sawnhey
Ed Yanosik
Saravana Rathinam
Saurabh N. Adya
11
5
0
20 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
18
1
0
12 May 2023
Multimodal Understanding Through Correlation Maximization and
  Minimization
Multimodal Understanding Through Correlation Maximization and Minimization
Yi Shi
Marc Niethammer
30
0
0
04 May 2023
Previous
123456
Next