Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2407.12679
Cited By

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

17 July 2024

Kirolos Ataallah

Eslam Abdelrahman

Jürgen Schmidhuber

Mohamed Elhoseiny

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)

Papers citing "Goldfish: Vision-Language Understanding of Arbitrarily Long Videos"

16 / 16 papers shown

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

116

0

0

03 Dec 2025

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Vaggelis Dorovatas

111

0

0

20 Oct 2025

From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding

From Frames to Clips: Training-free Adaptive Key Clip Selection for Long-Form Video Understanding

153

0

0

02 Oct 2025

POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency

POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency

Saydul Akbar Murad

147

0

0

01 Oct 2025

VC-Agent: An Interactive Agent for Customized Video Dataset Collection

VC-Agent: An Interactive Agent for Customized Video Dataset Collection

184

0

0

25 Sep 2025

Think With Videos For Agentic Long-Video Understanding

Think With Videos For Agentic Long-Video Understanding

Andrii Zadaianchuk

544

1

0

12 Jun 2025

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

281

4

0

13 Apr 2025

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

679

17

0

03 Apr 2025

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

292

25

0

27 Mar 2025

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Abdelrahman M. Shaker

Hamid Rezatofighi

Fahad Shahbaz Khan

929

3

0

27 Mar 2025

Memory-enhanced Retrieval Augmentation for Long Video Understanding

Memory-enhanced Retrieval Augmentation for Long Video Understanding

Zhengyang Liang

Andrii Zadaianchuk

348

9

0

12 Mar 2025

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance

Gabriele Spadaro

Enzo Tartaglione

Enzo Tartaglione

970

15

0

05 Jan 2025

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Hengshuang Zhao

763

52

0

02 Jan 2025

Neptune: The Long Orbit to Benchmarking Long Video Understanding

N. B. Gundavarapu

...

Cordelia Schmid

Mikhail Sirotenko

452

16

0

12 Dec 2024

InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows

InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows

Kirolos Ataallah

Eslam Abdelrahman

Mohamed Elhoseiny

275

14

0

28 Jun 2024

VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs

VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Rohit K Bharadwaj

Muzammal Naseer

Fahad Shahbaz Khan

Salman Khan

380

17

0

14 Jun 2024

Page 1 of 1