Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.01210
Cited By
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
3 June 2024
Ding Jia
Jianyuan Guo
Kai Han
Han Wu
Chao Zhang
Chang Xu
Xinghao Chen
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer"
4 / 4 papers shown
Title
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bo Yin
Xuying Zhang
Zhongyu Li
Li Liu
Ming-Ming Cheng
Qibin Hou
19
13
0
18 Sep 2023
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
201
179
0
20 Jan 2022
CMT: Convolutional Neural Networks Meet Vision Transformers
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Chunjing Xu
Yunhe Wang
Chang Xu
ViT
315
500
0
13 Jul 2021
Visual Saliency Transformer
Nian Liu
Ni Zhang
Kaiyuan Wan
Ling Shao
Junwei Han
ViT
222
281
0
25 Apr 2021
1