Vision Generalist Model: A SurveyInternational Journal of Computer Vision (IJCV), 2025 |
Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect TimesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Vid2Coach: Transforming How-To Videos into Task AssistantsACM Symposium on User Interface Software and Technology (UIST), 2025 |
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |