ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.12679
  4. Cited By
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

17 July 2024
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Mingchen Zhuge
Jian Ding
Deyao Zhu
Jürgen Schmidhuber
Mohamed Elhoseiny
    VLM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Goldfish: Vision-Language Understanding of Arbitrarily Long Videos"

16 / 16 papers shown
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
J. Li
Bin Li
Jiahao Li
Yan Lu
116
0
0
03 Dec 2025
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
Vaggelis Dorovatas
Soroush Seifi
Gunshi Gupta
Rahaf Aljundi
111
0
0
20 Oct 2025
From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding
Guangyu Sun
Archit Singhal
Burak Uzkent
Mubarak Shah
Chen Chen
Garin Kessler
CLIPVLM
153
0
0
02 Oct 2025
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
147
0
0
01 Oct 2025
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
Yidan Zhang
Mutian Xu
Yiming Hao
Kun Zhou
Jiahao Chang
Xiaoqiang Liu
Pengfei Wan
Hongbo Fu
Xiaoguang Han
VGen
184
0
0
25 Sep 2025
Think With Videos For Agentic Long-Video Understanding
Think With Videos For Agentic Long-Video Understanding
Huaying Yuan
Zheng Liu
Junjie Zhou
Ji-Rong Wen
Yan Shu
Andrii Zadaianchuk
Ji-Rong Wen
Zhicheng Dou
VLM
544
1
0
12 Jun 2025
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
Xingrui Wang
Jiang-Long Liu
Liang Luo
Xiaodong Yu
Jialian Wu
Xingwu Sun
Yusheng Su
Yaoyao Liu
Zicheng Liu
Emad Barsoum
DiffMVGen
281
4
0
13 Apr 2025
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
VLM
679
17
0
03 Apr 2025
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Shuming Liu
Chen Zhao
Tianqi Xu
Bernard Ghanem
VLM
292
25
0
27 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
929
3
0
27 Mar 2025
Memory-enhanced Retrieval Augmentation for Long Video Understanding
Memory-enhanced Retrieval Augmentation for Long Video Understanding
Huaying Yuan
Zhengyang Liang
Minhao Qin
Hongjin Qian
Yan Shu
Zhicheng Dou
Ji-Rong Wen
Andrii Zadaianchuk
VOSRALMVLM
348
9
0
12 Mar 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
970
15
0
05 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Yuan Liu
Hengshuang Zhao
763
52
0
02 Jan 2025
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
452
16
0
12 Dec 2024
InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
Kirolos Ataallah
Eslam Abdelrahman
Mahmoud Ahmed
Chenhui Gou
Khushbu Pahwa
Jian Ding
Mohamed Elhoseiny
VLM
275
14
0
28 Jun 2024
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Rohit K Bharadwaj
Hanan Gani
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
380
17
0
14 Jun 2024
1
Page 1 of 1