ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey

Multimodal Learning with Transformers: A Survey

13 June 2022
P. Xu
Xiatian Zhu
David A. Clifton
    ViT
ArXivPDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 268 papers shown
Title
Mutual Information Analysis in Multimodal Learning Systems
Mutual Information Analysis in Multimodal Learning Systems
Hadi Hadizadeh
S. F. Yeganli
Bahador Rashidi
Ivan V. Bajić
17
2
0
21 May 2024
Generative AI Empowered LiDAR Point Cloud Generation with Multimodal
  Transformer
Generative AI Empowered LiDAR Point Cloud Generation with Multimodal Transformer
Mohammad Farzanullah
Han Zhang
A. B. Sediq
Ali Afana
Melike Erol-Kantarci
18
1
0
20 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
30
27
0
18 May 2024
Networking Systems for Video Anomaly Detection: A Tutorial and Survey
Networking Systems for Video Anomaly Detection: A Tutorial and Survey
Jing Liu
Yang Liu
Jieyu Lin
Jielin Li
Peng Sun
Bo Hu
Liang Song
Azzedine Boukerche
Victor C.M. Leung
Victor C.M. Leung
43
10
0
16 May 2024
Representation Learning of Daily Movement Data Using Text Encoders
Representation Learning of Daily Movement Data Using Text Encoders
Alexander Capstick
Tianyu Cui
Yu Chen
Payam Barnaghi
AI4TS
14
2
0
07 May 2024
A Short Survey of Human Mobility Prediction in Epidemic Modeling from
  Transformers to LLMs
A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs
Christian N. Mayemba
D'Jeff K. Nkashama
Jean Marie Tshimula
Maximilien V. Dialufuma
Jean Tshibangu Muabila
...
Kalonji Kalala
Aristarque Ilunga
Lambert Mukendi Ntobo
Dominique Muteba
A. Abedi
16
1
0
25 Apr 2024
Unveiling and Mitigating Generalized Biases of DNNs through the
  Intrinsic Dimensions of Perceptual Manifolds
Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual Manifolds
Yanbiao Ma
Licheng Jiao
Fang Liu
Lingling Li
Wenping Ma
Shuyuan Yang
Xu Liu
Puhua Chen
29
0
0
22 Apr 2024
Sequential Compositional Generalization in Multimodal Models
Sequential Compositional Generalization in Multimodal Models
Semih Yagcioglu
Osman Batur .Ince
Aykut Erdem
Erkut Erdem
Desmond Elliott
Deniz Yuret
29
1
0
18 Apr 2024
Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip
  Exoskeleton via Vision and Kinematics Fusion
Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion
Ruoqi Zhao
Xing-bang Yang
Yubo Fan
21
0
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
16
14
0
18 Apr 2024
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and
  Research Agenda
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda
Johannes Schneider
67
26
0
15 Apr 2024
Global Contrastive Training for Multimodal Electronic Health Records
  with Language Supervision
Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision
Yingbo Ma
Suraj Kolla
Zhenhong Hu
Dhruv Kaliraman
Victoria Nolan
...
Jeremy A. Balch
Tyler J. Loftus
Parisa Rashidi
A. Bihorac
B. Shickel
AI4TS
17
1
0
10 Apr 2024
Cross-Attention is Not Always Needed: Dynamic Cross-Attention for
  Audio-Visual Dimensional Emotion Recognition
Cross-Attention is Not Always Needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion Recognition
R Gnana Praveen
Jahangir Alam
36
2
0
28 Mar 2024
Debiasing surgeon: fantastic weights and how to find them
Debiasing surgeon: fantastic weights and how to find them
Rémi Nahon
Ivan Luiz De Moura Matos
Van-Tam Nguyen
Enzo Tartaglione
21
1
0
21 Mar 2024
Leveraging Large Language Model-based Room-Object Relationships
  Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation
Leyuan Sun
Asako Kanezaki
Guillaume Caron
Yusuke Yoshiyasu
LM&Ro
19
2
0
21 Mar 2024
Affective Behaviour Analysis via Integrating Multi-Modal Knowledge
Affective Behaviour Analysis via Integrating Multi-Modal Knowledge
Wei Zhang
Feng Qiu
Chen Liu
Lincheng Li
Heming Du
Tiancheng Guo
Xin Yu
33
21
0
16 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
19
2
0
14 Mar 2024
Materials science in the era of large language models: a perspective
Materials science in the era of large language models: a perspective
Ge Lei
Ronan Docherty
Samuel J. Cooper
35
3
0
11 Mar 2024
Temporal Cross-Attention for Dynamic Embedding and Tokenization of
  Multimodal Electronic Health Records
Temporal Cross-Attention for Dynamic Embedding and Tokenization of Multimodal Electronic Health Records
Yingbo Ma
Suraj Kolla
Dhruv Kaliraman
Victoria Nolan
Zhenhong Hu
...
T. Ozrazgat-Baslanti
Tyler J. Loftus
Parisa Rashidi
A. Bihorac
B. Shickel
AI4TS
19
1
0
06 Mar 2024
Time Series Analysis in Compressor-Based Machines: A Survey
Time Series Analysis in Compressor-Based Machines: A Survey
Francesca Forbicini
Nicolò Oreste Pinciroli Vago
Piero Fraternali
AI4CE
16
0
0
27 Feb 2024
Hallucinations or Attention Misdirection? The Path to Strategic Value
  Extraction in Business Using Large Language Models
Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models
Aline Ioste
24
0
0
21 Feb 2024
Can Text-to-image Model Assist Multi-modal Learning for Visual
  Recognition with Visual Modality Missing?
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
24
4
0
14 Feb 2024
Intriguing Differences Between Zero-Shot and Systematic Evaluations of
  Vision-Language Transformer Models
Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models
Shaeke Salman
M. Shams
Xiuwen Liu
Lingjiong Zhu
VLM
11
2
0
13 Feb 2024
Quantifying and Enhancing Multi-modal Robustness with Modality
  Preference
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Zequn Yang
Yake Wei
Ce Liang
Di Hu
AAML
19
9
0
09 Feb 2024
AI enhanced data assimilation and uncertainty quantification applied to
  Geological Carbon Storage
AI enhanced data assimilation and uncertainty quantification applied to Geological Carbon Storage
G. S. Seabra
N. T. Mücke
Vinicius Luiz Santos Silva
Denis Voskov
F. Vossepoel
AI4CE
8
10
0
09 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
27
7
0
08 Feb 2024
Examining Modality Incongruity in Multimodal Federated Learning for
  Medical Vision and Language-based Disease Detection
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
Pramit Saha
Divyanshu Mishra
Felix Wagner
Konstantinos Kamnitsas
J. A. Noble
16
5
0
07 Feb 2024
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based
  Recommendation
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
Xiaohan Yu
Li Zhang
Xin Zhao
Yue Wang
Zhongrui Ma
33
6
0
07 Feb 2024
InVA: Integrative Variational Autoencoder for Harmonization of
  Multi-modal Neuroimaging Data
InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data
Bowen Lei
Rajarshi Guhaniyogi
Krishnendu Chandra
Aaron Scheffler
Bani Mallick
8
0
0
05 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual
  Question Answering
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
13
1
0
04 Feb 2024
The Landscape and Challenges of HPC Research and LLMs
The Landscape and Challenges of HPC Research and LLMs
Le Chen
Nesreen K. Ahmed
Akashnil Dutta
Arijit Bhattacharjee
Sixing Yu
...
Vy A. Vo
J. P. Muñoz
Ted Willke
Tim Mattson
Ali Jannesari
AI4CE
24
19
0
03 Feb 2024
Computation and Parameter Efficient Multi-Modal Fusion Transformer for
  Cued Speech Recognition
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Lei Liu
Li Liu
Haizhou Li
6
6
0
31 Jan 2024
A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect
A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect
Yunkang Cao
Xiaohao Xu
Jiangning Zhang
Yuqi Cheng
Xiaonan Huang
Guansong Pang
Weiming Shen
81
41
0
29 Jan 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
10
0
0
29 Jan 2024
Intriguing Equivalence Structures of the Embedding Space of Vision
  Transformers
Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
Shaeke Salman
M. Shams
Xiuwen Liu
24
6
0
28 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
  Modalities
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
16
7
0
25 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
15
2
0
15 Jan 2024
Transformer for Object Re-Identification: A Survey
Transformer for Object Re-Identification: A Survey
Mang Ye
Shuo Chen
Chenyue Li
Wei-Shi Zheng
David J. Crandall
Bo Du
ViT
90
12
0
13 Jan 2024
A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for
  Enhancing RSVP-BCI Decoding
A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI Decoding
Xujin Li
Wei Wei
Shuang Qiu
Huiguang He
13
0
0
12 Jan 2024
Complementary Information Mutual Learning for Multimodality Medical
  Image Segmentation
Complementary Information Mutual Learning for Multimodality Medical Image Segmentation
Chuyun Shen
Wenhao Li
Haoqing Chen
Xiaoling Wang
Fengping Zhu
Yuxin Li
Xiangfeng Wang
Bo Jin
30
3
0
05 Jan 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and
  Highlight Detection
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Hao Sun
Mingyao Zhou
Wenjing Chen
Wei Xie
PINN
3DGS
ViT
14
31
0
04 Jan 2024
Inter-X: Towards Versatile Human-Human Interaction Analysis
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu
Xintao Lv
Yichao Yan
Xin Jin
Shuwen Wu
...
Fengyun Rao
Xingdong Sheng
Yunhui Liu
Wenjun Zeng
Xiaokang Yang
24
25
0
26 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Can Physician Judgment Enhance Model Trustworthiness? A Case Study on
  Predicting Pathological Lymph Nodes in Rectal Cancer
Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer
Kazuma Kobayashi
Yasuyuki Takamizawa
M. Miyake
Sono Ito
Lin Gu
Tatsuya Nakatsuka
Yu Akagi
Tatsuya Harada
Y. Kanemitsu
Ryuji Hamamoto
15
2
0
15 Dec 2023
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
L. Nguyen
Praneeth Susarla
Anirban Mukherjee
Manuel Lage Cañellas
Constantino Álvarez Casado
Xiaoting Wu
Olli Silvén
D. Jayagopi
Miguel Bordallo López
13
1
0
11 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Hongyuan Zhu
Jiayuan Fan
Tao Chen
MLLM
24
76
0
30 Nov 2023
Large Model Based Referring Camouflaged Object Detection
Large Model Based Referring Camouflaged Object Detection
Shupeng Cheng
Ge-Peng Ji
Pengda Qin
Deng-Ping Fan
Bowen Zhou
Peng-Tao Xu
ObjD
13
7
0
28 Nov 2023
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for
  Vision-Language Tracking
Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking
Jiawei Ge
Xiangmei Chen
Jiuxin Cao
Xueling Zhu
Bo Liu
VLM
27
2
0
28 Nov 2023
Images Connect Us Together: Navigating a COVID-19 Local Outbreak in
  China Through Social Media Images
Images Connect Us Together: Navigating a COVID-19 Local Outbreak in China Through Social Media Images
Changyang He
Lu He
Wenjie Yang
Bo-wen Li
11
1
0
18 Nov 2023
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based
  Inference
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference
Marvin Schmitt
Stefan T. Radev
Paul-Christian Burkner
40
5
0
17 Nov 2023
Previous
123456
Next