Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.01210
Cited By
v1
v2 (latest)
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
3 June 2024
Ding Jia
Jianyuan Guo
Kai Han
Han Wu
Chao Zhang
Chang Xu
Xinghao Chen
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (81★)
Papers citing
"GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer"
6 / 6 papers shown
Title
DCIRNet: Depth Completion with Iterative Refinement for Dexterous Grasping of Transparent and Reflective Objects
Guanghu Xie
Zhiduo Jiang
Yonglong Zhang
Yang Liu
Zongwu Xie
Baoshi Cao
Hong Liu
70
0
0
11 Jun 2025
Semantics-aware Predictive Inspection Path Planning
M. Dharmadhikari
Kostas Alexis
23
0
0
06 Jun 2025
DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment
Vaishnav Ramesh
Junliang Liu
Haining Wang
Md Jahidul Islam
21
0
0
29 May 2025
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
Chenfei Liao
Kaiyu Lei
Xu Zheng
Junha Moon
Zhixiong Wang
Yansen Wang
Danda Pani Paudel
Luc Van Gool
Xuming Hu
VLM
142
8
0
24 Mar 2025
Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation
Leideng Shi
Juan Zhang
116
1
0
14 Mar 2025
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Jianqi Chen
Panwen Hu
Xiaojun Chang
Z. Shi
Michael C. Kampffmeyer
Xiaodan Liang
154
11
0
14 Oct 2024
1