Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.03757
Cited By
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
5 September 2024
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding"
15 / 15 papers shown
Title
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
37
0
0
15 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL
3DPC
48
0
0
09 Apr 2025
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
Weichen Zhang
Ruiying Peng
Chen Gao
Jianjie Fang
Xin Zeng
...
Z. Wang
Jinqiang Cui
Xin Wang
Xinlei Chen
Y. Li
LRM
54
0
0
06 Apr 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Y. Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
59
2
0
28 Mar 2025
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Zhenyu Pan
Han Liu
OffRL
LRM
44
1
0
24 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
34
0
0
20 Mar 2025
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu
Yuting Mei
Xinbi Liu
Sipeng Zheng
Qin Jin
VLM
MDE
41
0
0
19 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Y. Li
Xinlei Chen
Xiao-Ping Zhang
LRM
43
0
0
14 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
J. Chen
Jianke Zhu
3DV
LRM
52
1
0
01 Mar 2025
SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation
Junlong Ren
Hao Wu
Hui Xiong
H. Wang
43
0
0
26 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
58
3
0
02 Jan 2025
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
18
0
0
18 Oct 2024
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian
Junru Gu
Bailin Li
Yicheng Liu
Yang Wang
Chenxu Hu
Kun Zhan
Peng Jia
Xianpeng Lang
Hang Zhao
VLM
23
35
0
19 Feb 2024
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
271
4,299
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
221
499
0
22 Apr 2021
1