
Title |
|---|
![]() VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024 |
![]() Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene UnderstandingNeural Information Processing Systems (NeurIPS), 2024 |
![]() 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less HallucinationComputer Vision and Pattern Recognition (CVPR), 2024 |
![]() BRAVE: Broadening the visual encoding of vision-language modelsEuropean Conference on Computer Vision (ECCV), 2024 |