
Title |
|---|
![]() CountLLM: Towards Generalizable Repetitive Action Counting via Large Language ModelComputer Vision and Pattern Recognition (CVPR), 2025 |
![]() BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 |
LION-FS: Fast & Slow Video-Language Thinker as Online Video AssistantComputer Vision and Pattern Recognition (CVPR), 2025 |
![]() Omni-RGPT: Unifying Image and Video Region-level Understanding via Token MarksComputer Vision and Pattern Recognition (CVPR), 2025 |
![]() A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for
Accelerating Large VLMsComputer Vision and Pattern Recognition (CVPR), 2024 |
![]() MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye Haotian Zhang Erik Daxberger Lin Chen Zongyu Lin ...Haoxuan You Dan Xu Zhe Gan Jiasen Lu Yinfei Yang |
![]() AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024 |