OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

24 June 2024

Papers citing "OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer"

7 / 7 papers shown

Title
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding Henghao Zhao Ge-Peng Ji Rui Yan Huan Xiong Zechao Li 16 0 0 10 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook Xu Zheng Ziqiao Weng Yuanhuiyi Lyu Lutao Jiang Haiwei Xue Bin Ren Danda Pani Paudel N. Sebe Luc Van Gool Xuming Hu 3DV 37 0 0 23 Mar 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding Weiyu Guo Ziyang Chen Shaoguang Wang JianXiang He Yijie Xu Jinhui Ye Ying Sun Hui Xiong 42 1 0 17 Mar 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma? Tianyuan Qu Longxiang Tang Bohao Peng Senqiao Yang Bei Yu Jiaya Jia VLM 57 0 0 16 Mar 2025
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Yongdong Luo Xiawu Zheng Xiao Yang Guilin Li Haojia Lin Jinfa Huang Jiayi Ji Fei Chao Jiebo Luo Rongrong Ji VLM 79 12 0 20 Nov 2024
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Bin Lin Yang Ye Bin Zhu Jiaxi Cui Munan Ning Peng Jin Li-ming Yuan VLM MLLM 182 576 0 16 Nov 2023
DialPort: Connecting the Spoken Dialog Research Community to Real User Data Tiancheng Zhao Kyusong Lee M. Eskénazi 21 22 0 08 Jun 2016