ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.08721
  4. Cited By
Multimodal Token Fusion for Vision Transformers
v1v2 (latest)

Multimodal Token Fusion for Vision Transformers

Computer Vision and Pattern Recognition (CVPR), 2022
19 April 2022
Yikai Wang
Xinghao Chen
Lele Cao
Wen-bing Huang
Gang Hua
Yunhe Wang
    ViT
ArXiv (abs)PDFHTMLGithub (180★)

Papers citing "Multimodal Token Fusion for Vision Transformers"

50 / 105 papers shown
GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection
GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection
Md Sohag Mia
Md Nahid Hasan
Tawhid Ahmed
Muhammad Abdullah Adnan
3DPCViT
276
0
0
02 Dec 2025
Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
Ariful Islam
Tanvir Mahmud
Md Rifat Hossen
ViT
223
0
0
28 Nov 2025
DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation
DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation
Yan Gong
J. Lu
Yongsheng Gao
Jie Zhao
X. Zhang
Susanto Rahardja
157
0
0
17 Nov 2025
From Classical to Hybrid: A Practical Framework for Quantum-Enhanced Learning
From Classical to Hybrid: A Practical Framework for Quantum-Enhanced Learning
Silvie Illésová
Tomáš Bezděk
Vojtěch Novák
Ivan Zelinka
Stefano Cacciatore
Martin Beseda
256
0
0
11 Nov 2025
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
Leyan Xue
Zongbo Han
Kecheng Xue
Xiaohong Liu
Guangyu Wang
C. Zhang
199
0
0
09 Nov 2025
Robust Multimodal Semantic Segmentation with Balanced Modality Contributions
Robust Multimodal Semantic Segmentation with Balanced Modality Contributions
Jiaqi Tan
Xu Zheng
F. Li
Yang Liu
155
0
0
29 Sep 2025
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
Xiaoqi Zhao
Youwei Pang
Chenyang Yu
Lihe Zhang
Huchuan Lu
Shijian Lu
Georges El Fakhri
Xiaofeng Liu
198
2
0
19 Sep 2025
OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation
OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation
Bo Yin
Jiao-Long Cao
Xuying Zhang
Yuming Chen
Ming-Ming Cheng
Qibin Hou
MLLMVLM
151
3
0
18 Sep 2025
MMMS: Multi-Modal Multi-Surface Interactive Segmentation
MMMS: Multi-Modal Multi-Surface Interactive Segmentation
Robin Schon
Julian Lorenz
K. Ludwig
Daniel Kienzle
Rainer Lienhart
156
0
0
16 Sep 2025
Multimodal SAM-adapter for Semantic Segmentation
Multimodal SAM-adapter for Semantic SegmentationIEEE Access (IEEE Access), 2025
Iacopo Curti
Pierluigi Zama Ramirez
Alioscia Petrelli
Luigi Di Stefano
179
1
0
12 Sep 2025
Adaptive Point-Prompt Tuning: Fine-Tuning Heterogeneous Foundation Models for 3D Point Cloud Analysis
Adaptive Point-Prompt Tuning: Fine-Tuning Heterogeneous Foundation Models for 3D Point Cloud Analysis
Mengke Li
Lihao Chen
Peng Zhang
Yiu-ming Cheung
Hui Huang
211
0
0
30 Aug 2025
HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection
HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection
Harris Song
Tuan-Anh Vu
Sanjith Menon
Sriram Narasimhan
M. Khalid Jawed
279
1
0
28 Aug 2025
Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning
Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning
Junhao Wu
Yun Li
Junhao Li
Jingliang Bian
Xiaomao Fan
Wenbin Lei
Ruxin Wang
127
0
0
23 Aug 2025
MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning
MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning
Thanh-Dat Truong
Christophe Bobda
Nitin Agarwal
Khoa Luu
324
2
0
13 Aug 2025
Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
Timon Merk
Saeed Salehi
Richard M. Koehler
Qiming Cui
Maria Olaru
...
Nicole R. Provenza
Simon Little
Reza Abbasi-Asl
Phil A. Starr
Wolf-Julian Neumann
AI4CE
163
0
0
13 Aug 2025
DMTrack: Spatio-Temporal Multimodal Tracking via Dual-Adapter
DMTrack: Spatio-Temporal Multimodal Tracking via Dual-Adapter
Weihong Li
Shaohua Dong
Haonan Lu
Yanhao Zhang
Heng Fan
L. Zhang
164
0
0
03 Aug 2025
Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians
Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians
Quankai Gao
Iliyan Georgiev
Tuanfeng Y. Wang
Krishna Kumar Singh
Ulrich Neumann
Jae Shin Yoon
3DGS
253
4
0
02 Aug 2025
AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection
AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection
Haote Zhang
Lipeng Gu
Wuzhou Quan
Fu Lee Wang
Honghui Fan
Jiali Tang
Dingkun Zhu
H. Xie
Xiaoping Zhang
Mingqiang Wei
ObjD
321
1
0
27 Jul 2025
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu
OffRL
375
1
0
13 Jun 2025
BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation
BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation
Jialei Chen
Xu Zheng
Danda Pani Paudel
Luc Van Gool
Hiroshi Murase
Daisuke Deguchi
264
0
0
04 Jun 2025
EGFormer: Towards Efficient and Generalizable Multimodal Semantic Segmentation
EGFormer: Towards Efficient and Generalizable Multimodal Semantic Segmentation
Zelin Zhang
Tao Zhang
KediLI
Xu Zheng
246
0
0
20 May 2025
A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
Rui Wang
Shichun Yang
Yuyi Chen
Z. Li
Zexiang Tong
Jinfeng Xu
Jiayi Lu
Xinjie Feng
Yaoguang Cao
222
1
0
16 May 2025
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Xu Zheng
Yuanhuiyi Lyu
Lutao Jiang
Danda Pani Paudel
Luc Van Gool
Xuming Hu
258
8
0
10 May 2025
Position: Foundation Models Need Digital Twin Representations
Position: Foundation Models Need Digital Twin Representations
Yiqing Shen
Hao Ding
Lalithkumar Seenivasan
Tianmin Shu
Mathias Unberath
AI4CE
458
10
0
01 May 2025
HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework
HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch FrameworkIEEE Signal Processing Letters (IEEE SPL), 2025
Shuobin Wei
Zhuang Zhou
Zhengan Lu
Zizhao Yuan
Binghua Su
MDE
594
6
0
18 Apr 2025
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
DFormerv2: Geometry Self-Attention for RGBD Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2025
Bo Yin
Jiao-Long Cao
Ming-Ming Cheng
Qibin Hou
3DPCMDE
410
26
0
07 Apr 2025
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
Xingyu Hu
Junjun Jiang
Chenyang Wang
Kui Jiang
Xianming Liu
Jiayi Ma
485
2
0
07 Apr 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
320
0
0
25 Mar 2025
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
Chenfei Liao
Kaiyu Lei
Xu Zheng
Junha Moon
Zhixiong Wang
Longji Xu
Danda Pani Paudel
Luc Van Gool
Xuming Hu
VLM
608
21
0
24 Mar 2025
PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes
PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor ScenesAAAI Conference on Artificial Intelligence (AAAI), 2025
Xinhua Xu
Hong Liu
Jianbing Wu
Jinfu Liu
DiffM
357
1
0
24 Mar 2025
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Jiayi Zhao
Fei Teng
Kai Luo
Guoqiang Zhao
Hui Yuan
Xu Zheng
Kailun Yang
VLM
403
11
0
04 Mar 2025
Deep-JGAC: End-to-End Deep Joint Geometry and Attribute Compression for Dense Colored Point Clouds
Deep-JGAC: End-to-End Deep Joint Geometry and Attribute Compression for Dense Colored Point Clouds
Yun Zhang
Zixi Guo
Linwei Zhu
C.-C. Jay Kuo
3DPC
322
2
0
25 Feb 2025
MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition
MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition
Paul Koch
Marian Schluter
Jörg Krüger
334
0
0
24 Feb 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Learning Motion and Temporal Cues for Unsupervised Video Object SegmentationIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
524
11
0
14 Jan 2025
MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation
  via Hierarchical Modality Selection
MAGIC++: Efficient and Resilient Modality-Agnostic Semantic Segmentation via Hierarchical Modality Selection
Xu Zheng
Yuanhuiyi Lyu
Lutao Jiang
Jiazhou Zhou
Lin Wang
Xuming Hu
379
13
0
22 Dec 2024
AlzheimerRAG: Multimodal Retrieval Augmented Generation for Clinical Use Cases using PubMed articles
AlzheimerRAG: Multimodal Retrieval Augmented Generation for Clinical Use Cases using PubMed articlesMachine Learning and Knowledge Extraction (MLKE), 2024
A. Lahiri
Qinmin Vivian Hu
417
10
0
21 Dec 2024
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
Part-Whole Relational Fusion Towards Multi-Modal Scene UnderstandingInternational Journal of Computer Vision (IJCV), 2024
Yi Liu
Chengxin Li
Shoukun Xu
Jiawei Han
ViT
243
35
0
19 Oct 2024
Order-aware Interactive Segmentation
Order-aware Interactive SegmentationInternational Conference on Learning Representations (ICLR), 2024
Sijin Yu
Anwesa Choudhuri
Meng Zheng
Zhongpai Gao
Benjamin Planche
Andong Deng
Qin Liu
Terrence Chen
Ulas Bagci
Ziyan Wu
VLM
1.1K
2
0
16 Oct 2024
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
Niki Nezakati
Md Kaykobad Reza
Mashhour Solh
Mashhour Solh
M. Salman Asif
457
6
0
03 Oct 2024
AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation
AUCSeg: AUC-oriented Pixel-level Long-tail Semantic SegmentationNeural Information Processing Systems (NeurIPS), 2024
Boyu Han
Qianqian Xu
Zhiyong Yang
Shilong Bao
Peisong Wen
Yangbangyan Jiang
Qingming Huang
443
20
0
30 Sep 2024
Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on
  Deep Tucker Decomposition Network with Spatial-Spectral Manifold Learning
Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network with Spatial-Spectral Manifold LearningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
He Wang
Yang Xu
Zebin Wu
Zhihui Wei
217
12
0
15 Sep 2024
MICDrop: Masking Image and Depth Features via Complementary Dropout for
  Domain-Adaptive Semantic Segmentation
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic SegmentationEuropean Conference on Computer Vision (ECCV), 2024
Linyan Yang
Lukas Hoyer
Mark Weber
Tobias Fischer
Dengxin Dai
Laura Leal-Taixé
Marc Pollefeys
Daniel Cremers
Luc Van Gool
MDE
339
16
0
29 Aug 2024
FusionSAM: Visual Multi-Modal Learning with Segment Anything
FusionSAM: Visual Multi-Modal Learning with Segment AnythingKnowledge Discovery and Data Mining (KDD), 2024
Daixun Li
Weiying Xie
Mingxiang Cao
Yunke Wang
Jiaqing Zhang
Leyuan Fang
Yunsong Li
Chang Xu
373
6
0
26 Aug 2024
Depth-guided Texture Diffusion for Image Semantic Segmentation
Depth-guided Texture Diffusion for Image Semantic Segmentation
Wei Sun
Yuan Li
Qixiang Ye
Jianbin Jiao
Yanzhao Zhou
DiffMMDE
222
4
0
17 Aug 2024
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation
Yue Duan
Zhangxuan Gu
ZhenZhe Ying
Changhua Meng
Xuelong Li
429
30
0
02 Aug 2024
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets
Muhammad Abdullah Jamal
Omid Mohareri
222
5
0
29 Jul 2024
Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual
  Transformers
Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers
Mengke Li
Da Li
Guoqing Yang
Yiu-ming Cheung
Hui Huang
3DPC
441
6
0
18 Jul 2024
Learning Modality-agnostic Representation for Semantic Segmentation from
  Any Modalities
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
341
34
0
16 Jul 2024
Centering the Value of Every Modality: Towards Efficient and Resilient
  Modality-agnostic Semantic Segmentation
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xueye Zheng
Yuanhuiyi Lyu
Jiazhou Zhou
Lin Wang
377
23
0
16 Jul 2024
Movie Recommendation with Poster Attention via Multi-modal Transformer
  Feature Fusion
Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion
Linhan Xia
Yicheng Yang
Ziou Chen
Zheng Yang
Shengxin Zhu
162
8
0
12 Jul 2024
123
Next
Page 1 of 3