Title
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding De-An Huang Subhashree Radhakrishnan Zhiding Yu Jan Kautz VGen VLM 71 0 0 24 Apr 2025
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding Shuming Liu Chen Zhao Tianqi Xu Bernard Ghanem VLM 69 0 0 27 Mar 2025
Improving LLM Video Understanding with 16 Frames Per Second Y. Li Changli Tang Jimin Zhuang Yudong Yang Guangzhi Sun W. Li Z. Ma Chao Zhang VLM 66 1 0 18 Mar 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding Weiyu Guo Ziyang Chen Shaoguang Wang JianXiang He Yijie Xu Jinhui Ye Ying Sun Hui Xiong 42 1 0 17 Mar 2025