Inference Compute-Optimal Video Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal InconsistencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Grounding Task Assistance with Multimodal Cues from a Single DemonstrationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
SeriesBench: A Benchmark for Narrative-Driven Drama Series UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 |
VEU-Bench: Towards Comprehensive Understanding of Video EditingComputer Vision and Pattern Recognition (CVPR), 2025 |
VideoVista-CulturalLingo: 360 Horizons-Bridging Cultures, Languages, and Domains in Video ComprehensionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |