ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey
v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
13 June 2022
Peng Xu
Xiatian Zhu
David Clifton
    ViT
ArXiv (abs)PDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown
Labeling Comic Mischief Content in Online Videos with a Multimodal
  Hierarchical-Cross-Attention Model
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
217
1
0
12 Jun 2024
UEMM-Air: Make Unmanned Aerial Vehicles Perform More Multi-modal Tasks
UEMM-Air: Make Unmanned Aerial Vehicles Perform More Multi-modal Tasks
Liang Yao
Liang Yao
Shengxiang Xu
Chuanyi Zhang
Xinlei Zhang
Ting Wu
Zequan Wang
Shimin Di
Jun Zhou
185
0
0
10 Jun 2024
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux ModellingInternational Conference on Learning Representations (ICLR), 2024
Matthew Fortier
Mats L. Richter
O. Sonnentag
Chris Pal
AI4CE
213
2
0
07 Jun 2024
ArMeme: Propagandistic Content in Arabic Memes
ArMeme: Propagandistic Content in Arabic Memes
Firoj Alam
A. Hasnat
Fatema Ahmed
Md. Arid Hasan
Maram Hasanain
186
11
0
06 Jun 2024
MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4
MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4
Vahid Azizi
Fatemeh Koochaki
VLM
322
0
0
03 Jun 2024
Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach
Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach
Mahsa Kadkhodaei Elyaderani
Shahram Shirani
337
0
0
02 Jun 2024
From Words to Actions: Unveiling the Theoretical Underpinnings of
  LLM-Driven Autonomous Systems
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
Jianliang He
Siyu Chen
Fengzhuo Zhang
Zhuoran Yang
LM&RoLLMAG
302
8
0
30 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Vasu Sharma
Eugenio Culurciello
321
27
0
28 May 2024
Mitigating Noisy Correspondence by Geometrical Structure Consistency
  Learning
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao
Mengxi Chen
Tianjie Dai
Jiangchao Yao
Bo han
Ya Zhang
Yanfeng Wang
NoLa
208
10
0
27 May 2024
ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection
ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection
Ziying Song
Hongyu Pan
Hongyu Pan
Y. Zhang
Lin Liu
...
Shaoqing Xu
Yang Ji
Tong Zhao
Li-e Wang
Yadan Luo
437
15
0
27 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
AI4CE
516
6
0
24 May 2024
Transformers for Image-Goal Navigation
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
347
2
0
23 May 2024
Mutual Information Analysis in Multimodal Learning Systems
Mutual Information Analysis in Multimodal Learning Systems
Hadi Hadizadeh
S. F. Yeganli
Bahador Rashidi
Ivan V. Bajić
73
3
0
21 May 2024
Generative AI Empowered LiDAR Point Cloud Generation with Multimodal
  Transformer
Generative AI Empowered LiDAR Point Cloud Generation with Multimodal Transformer
Mohammad Farzanullah
Han Zhang
A. B. Sediq
Ali Afana
Melike Erol-Kantarci
133
5
0
20 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of ExpertsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Tong Lu
Lin Ma
Min Zhang
MoE
235
100
0
18 May 2024
Networking Systems for Video Anomaly Detection: A Tutorial and Survey
Networking Systems for Video Anomaly Detection: A Tutorial and SurveyACM Computing Surveys (ACM CSUR), 2024
Jing Liu
Yang Liu
Jieyu Lin
Jielin Li
Yang Liu
Bo Hu
Liang Song
Peng Sun
Victor C.M. Leung
Victor C.M. Leung
569
33
0
16 May 2024
Representation Learning of Daily Movement Data Using Text Encoders
Representation Learning of Daily Movement Data Using Text Encoders
Alexander Capstick
Tianyu Cui
Yu Chen
Payam Barnaghi
AI4TS
239
2
0
07 May 2024
A Short Survey of Human Mobility Prediction in Epidemic Modeling from
  Transformers to LLMs
A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs
Christian N. Mayemba
D'Jeff K. Nkashama
Jean Marie Tshimula
Maximilien V. Dialufuma
Jean Tshibangu Muabila
...
Kalonji Kalala
Aristarque Ilunga
Lambert Mukendi Ntobo
Dominique Muteba
A. Abedi
195
2
0
25 Apr 2024
Unveiling and Mitigating Generalized Biases of DNNs through the
  Intrinsic Dimensions of Perceptual Manifolds
Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds
Yanbiao Ma
Licheng Jiao
Fang Liu
Lingling Li
Wenping Ma
Shuyuan Yang
Xu Liu
Puhua Chen
247
4
0
22 Apr 2024
Sequential Compositional Generalization in Multimodal Models
Sequential Compositional Generalization in Multimodal Models
Semih Yagcioglu
Osman Batur .Ince
Aykut Erdem
Erkut Erdem
Desmond Elliott
Deniz Yuret
195
1
0
18 Apr 2024
Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip
  Exoskeleton via Vision and Kinematics Fusion
Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion
Ruoqi Zhao
Xing-bang Yang
Yubo Fan
49
0
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
402
39
0
18 Apr 2024
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and
  Research Agenda
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda
Johannes Schneider
258
78
0
15 Apr 2024
Global Contrastive Training for Multimodal Electronic Health Records
  with Language Supervision
Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision
Yingbo Ma
Suraj Kolla
Zhenhong Hu
Dhruv Kaliraman
Victoria Nolan
...
Jeremy A. Balch
Tyler J. Loftus
Parisa Rashidi
A. Bihorac
B. Shickel
AI4TS
220
6
0
10 Apr 2024
Cross-Attention is Not Always Needed: Dynamic Cross-Attention for
  Audio-Visual Dimensional Emotion Recognition
Cross-Attention is Not Always Needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion Recognition
R Gnana Praveen
Jahangir Alam
252
5
0
28 Mar 2024
Debiasing surgeon: fantastic weights and how to find them
Debiasing surgeon: fantastic weights and how to find them
Rémi Nahon
Ivan Luiz De Moura Matos
Van-Tam Nguyen
Enzo Tartaglione
226
1
0
21 Mar 2024
Leveraging Large Language Model-based Room-Object Relationships
  Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Leyuan Sun
Asako Kanezaki
Guillaume Caron
Yusuke Yoshiyasu
LM&Ro
235
8
0
21 Mar 2024
A Survey on Quality Metrics for Text-to-Image Generation
A Survey on Quality Metrics for Text-to-Image GenerationIEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
Sebastian Hartwig
Dominik Engel
Leon Sick
H. Kniesel
Tristan Payer
Poonam Poonam
Michael Glockler
Alex Bauerle
Timo Ropinski
EGVM
297
0
0
18 Mar 2024
Affective Behaviour Analysis via Integrating Multi-Modal Knowledge
Affective Behaviour Analysis via Integrating Multi-Modal Knowledge
Wei Zhang
Feng Qiu
Chen Liu
Lincheng Li
Heming Du
Tiancheng Guo
Xin Yu
229
21
0
16 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
279
4
0
14 Mar 2024
Materials science in the era of large language models: a perspective
Materials science in the era of large language models: a perspectiveDigital Discovery (DD), 2024
Ge Lei
Ronan Docherty
Samuel J. Cooper
227
41
0
11 Mar 2024
Temporal Cross-Attention for Dynamic Embedding and Tokenization of
  Multimodal Electronic Health Records
Temporal Cross-Attention for Dynamic Embedding and Tokenization of Multimodal Electronic Health Records
Yingbo Ma
Suraj Kolla
Dhruv Kaliraman
Victoria Nolan
Zhenhong Hu
...
T. Ozrazgat-Baslanti
Tyler J. Loftus
Parisa Rashidi
A. Bihorac
B. Shickel
AI4TS
253
2
0
06 Mar 2024
Time Series Analysis in Compressor-Based Machines: A Survey
Time Series Analysis in Compressor-Based Machines: A Survey
Francesca Forbicini
Nicolò Oreste Pinciroli Vago
Piero Fraternali
AI4CE
221
0
0
27 Feb 2024
Hallucinations or Attention Misdirection? The Path to Strategic Value
  Extraction in Business Using Large Language Models
Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models
Aline Ioste
195
2
0
21 Feb 2024
Can Text-to-image Model Assist Multi-modal Learning for Visual
  Recognition with Visual Modality Missing?
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
274
6
0
14 Feb 2024
Intriguing Differences Between Zero-Shot and Systematic Evaluations of
  Vision-Language Transformer Models
Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models
Shaeke Salman
M. Shams
Xiuwen Liu
Lingjiong Zhu
VLM
170
3
0
13 Feb 2024
Quantifying and Enhancing Multi-modal Robustness with Modality
  Preference
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Zequn Yang
Yake Wei
Ce Liang
Di Hu
AAML
330
22
0
09 Feb 2024
AI enhanced data assimilation and uncertainty quantification applied to
  Geological Carbon Storage
AI enhanced data assimilation and uncertainty quantification applied to Geological Carbon Storage
G. S. Seabra
N. T. Mücke
Vinicius Luiz Santos Silva
Denis Voskov
F. Vossepoel
AI4CE
188
21
0
09 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
249
8
0
08 Feb 2024
Examining Modality Incongruity in Multimodal Federated Learning for
  Medical Vision and Language-based Disease Detection
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
Pramit Saha
Divyanshu Mishra
Felix Wagner
Konstantinos Kamnitsas
J. A. Noble
147
7
0
07 Feb 2024
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based
  Recommendation
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
Xiaohan Yu
Li Zhang
Xin Zhao
Yue Wang
Zhongrui Ma
172
14
0
07 Feb 2024
Integrative Variational Autoencoders for Generative Modeling of an Image Outcome with Multiple Input Images
Integrative Variational Autoencoders for Generative Modeling of an Image Outcome with Multiple Input Images
Bowen Lei
Rajarshi Guhaniyogi
Rajarshi Guhaniyogi
Aaron Scheffler
Bani Mallick
Alzheimer's Disease Neuroimaging Initiatives
220
0
0
05 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual
  Question Answering
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
259
8
0
04 Feb 2024
The Landscape and Challenges of HPC Research and LLMs
The Landscape and Challenges of HPC Research and LLMs
Le Chen
Nesreen K. Ahmed
Akashnil Dutta
Arijit Bhattacharjee
Sixing Yu
...
Vy A. Vo
J. P. Muñoz
Ted Willke
Tim Mattson
Ali Jannesari
AI4CE
265
34
0
03 Feb 2024
Computation and Parameter Efficient Multi-Modal Fusion Transformer for
  Cued Speech Recognition
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Lei Liu
Tianpeng Liu
Haizhou Li
262
12
0
31 Jan 2024
A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect
A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect
Yunkang Cao
Xiaohao Xu
Jiangning Zhang
Yuqi Cheng
Xiaonan Huang
Guansong Pang
Nong Sang
252
66
0
29 Jan 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
241
0
0
29 Jan 2024
Intriguing Equivalence Structures of the Embedding Space of Vision
  Transformers
Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
Shaeke Salman
M. Shams
Xiuwen Liu
272
7
0
28 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
  Modalities
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other ModalitiesComputer Vision and Pattern Recognition (CVPR), 2024
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
312
11
0
25 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Cascaded Cross-Modal Transformer for Audio-Textual ClassificationArtificial Intelligence Review (Artif Intell Rev), 2024
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
248
3
0
15 Jan 2024
Previous
1234567
Next