LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenInternational Conference on Learning Representations (ICLR), 2025 |
MLVU: Benchmarking Multi-task Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024 |
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023 |