ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.17432
  4. Cited By
Video Understanding with Large Language Models: A Survey
v1v2v3v4v5 (latest)

Video Understanding with Large Language Models: A Survey

29 December 2023
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
Teng Wang
Daoan Zhang
Jie An
Jingyang Lin
Rongyi Zhu
Ali Vosoughi
Chao Huang
Zeliang Zhang
Pinxin Liu
Mingqian Feng
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
    VLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (2325★)

Papers citing "Video Understanding with Large Language Models: A Survey"

50 / 104 papers shown
Title
TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs
TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs
Md. Adnan Arefeen
Biplob K. Debnath
S. Chakradhar
271
0
0
26 Nov 2025
While recognizing actions, LMMs struggle to detect core interaction events
While recognizing actions, LMMs struggle to detect core interaction events
Daniel Harari
Michael Sidorov
Liel David
Chen Shterental
Abrham Kahsay Gebreselasie
Muhammad Haris Khan
105
0
0
25 Nov 2025
VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
Shaobo Wang
Tianle Niu
Runkang Yang
Deshan Liu
Xu He
Zichen Wen
Conghui He
Xuming Hu
Linfeng Zhang
VGen
154
0
0
24 Nov 2025
Test-Time Temporal Sampling for Efficient MLLM Video Understanding
Test-Time Temporal Sampling for Efficient MLLM Video Understanding
Kaibin Wang
Mingbao Lin
24
0
0
22 Nov 2025
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
Y. Tang
Daiki Shimada
Hang Hua
Chao Huang
Jing Bi
Rogerio Feris
Chenliang Xu
117
0
0
21 Nov 2025
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
Aakriti Agrawal
Gouthaman KV
R. Aralikatti
Gauri Jagatap
Jiaxin Yuan
Vijay Kamarshi
Andrea Fanelli
Furong Huang
VLM
88
0
0
07 Nov 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
258
0
0
27 Oct 2025
Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications
Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World ApplicationsJournal of Big Data (JBD), 2025
Shamim Yazdani
Akansha Singh
N. Saxena
Sribala Vidyadhari Chinta
Avash Palikhe
Deng Pan
Umapada Pal
Jie Yang
Wenbin Zhang
128
2
0
23 Oct 2025
FeatureFool: Zero-Query Fooling of Video Models via Feature Map
FeatureFool: Zero-Query Fooling of Video Models via Feature Map
Duoxun Tang
Xi Xiao
Guangwu Hu
Kangkang Sun
Xiao Yang
Dongyang Chen
Qing Li
Yongjie Yin
Jiyao Wang
AAML
132
1
0
21 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
56
0
0
17 Oct 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Yunlong Tang
Jing Bi
Pinxin Liu
Zhenyu Pan
Mingqian Feng
...
Zeliang Zhang
Daiki Shimada
Han Liu
Jiebo Luo
Chenliang Xu
MLLMOffRLVLMLRM
486
7
0
06 Oct 2025
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen
Shaobo Wang
Yufa Zhou
J. Zhang
Qintong Zhang
...
Zhaorun Chen
Bin Wang
W. Li
Conghui He
Linfeng Zhang
VLM
84
6
0
01 Oct 2025
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
Haonan Ge
Yiwei Wang
Kai-Wei Chang
Hang Wu
Yujun Cai
LRM
140
0
0
28 Sep 2025
HyCoVAD: A Hybrid SSL-LLM Model for Complex Video Anomaly Detection
HyCoVAD: A Hybrid SSL-LLM Model for Complex Video Anomaly Detection
Mohammad Mahdi Hemmatyar
Mahdi Jafari
Mohammad Amin Yousefi
Mohammad Reza Nemati
Mobin Azadani
Hamid Reza Rastad
Amirmohammad Akbari
76
0
0
26 Sep 2025
Poisoning Prompt-Guided Sampling in Video Large Language Models
Poisoning Prompt-Guided Sampling in Video Large Language Models
Yuxin Cao
Wei Song
Jingling Xue
Jin Song Dong
AAML
65
1
0
25 Sep 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TSVLM
160
3
0
22 Sep 2025
LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations
LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations
Payal Varshney
Adriano Lucieri
Christoph Balada
Sheraz Ahmed
Andreas Dengel
VGen
119
0
0
10 Sep 2025
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding
Zhen Chen
Xingjian Luo
Kun Yuan
J. Wu
Danny Tat Ming Chan
Nassir Navab
Hongbin Liu
Zhen Lei
Jiebo Luo
156
2
0
30 Aug 2025
Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
G. Chaudhry
Esha Choukse
Haoran Qiu
Íñigo Goiri
Rodrigo Fonseca
Adam Belay
Ricardo Bianchini
84
2
0
22 Aug 2025
Failures to Surface Harmful Contents in Video Large Language Models
Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao
Wei Song
Derui Wang
Jingling Xue
Jin Song Dong
AAML
111
3
0
14 Aug 2025
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning
Dejie Yang
Zijing Zhao
Yang Liu
132
0
0
11 Aug 2025
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
Wentao Zhang
CMLELMLRM
177
3
0
22 Jul 2025
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang
Jiahui Yang
Jianqin Yin
Zhenbo Luo
Jian Luan
252
17
0
27 Jun 2025
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Qirui Zheng
Xingbo Wang
Keyuan Cheng
Muhammad Asif Ali
Yunlong Lu
Wenxin Li
126
0
0
17 Jun 2025
Can Vision Language Models Understand Mimed Actions?
Can Vision Language Models Understand Mimed Actions?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hyundong Justin Cho
Spencer Lin
Tejas Srinivasan
Michael Saxon
Deuksin Kwon
Natali T. Chavez
Jonathan May
140
3
0
17 Jun 2025
Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Youze Wang
Zijun Chen
Ruoyu Chen
Shishen Gu
Yinpeng Dong
...
Jun Zhu
Meng Wang
Richang Hong
Wenbo Hu
Richang Hong
279
0
0
14 Jun 2025
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation
CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation
Zhao Zhang
Yutao Cheng
Dexiang Hong
Maoke Yang
Gonglei Shi
Lei Ma
H. Zhang
Jie Shao
Xinglong Wu
DiffM
247
2
0
12 Jun 2025
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs
Wenrui Zhou
Shu Yang
Shu Yang
Qingsong Yang
Zikun Guo
Di Wang
Lijie Hu
Haiyan Zhao
134
6
0
08 Jun 2025
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Ujjwal Upadhyay
Mukul Ranjan
Zhiqiang Shen
Mohamed Elhoseiny
VLM
169
3
0
30 May 2025
Fostering Video Reasoning via Next-Event Prediction
Fostering Video Reasoning via Next-Event Prediction
Haonan Wang
Hongfu Liu
Xiangyan Liu
C. Du
Kenji Kawaguchi
Ye Wang
Tianyu Pang
AI4TSLRM
176
2
0
28 May 2025
Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing
Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing
Zehua Liu
Xiaolou Li
Li Guo
Lantian Li
D. Wang
102
0
0
27 May 2025
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Fanheng Kong
Jingyuan Zhang
Hongzhi Zhang
Shi Feng
Daling Wang
Linhao Yu
Xingguang Ji
Yu Tian
Qi Wang
Fuzheng Zhang
215
2
0
26 May 2025
Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation
Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation
Jiongchao Jin
Shengchu Zhao
Dajun Chen
Wei Jiang
Yong Li
169
0
0
26 May 2025
MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model
MSD-LLM: Predicting Ship Detention in Port State Control Inspections with Large Language Model
Jiongchao Jin
Xiuju Fu
Natchapon Jongwiriyanurak
Tao Cheng
Ran Yan
AI4CE
266
0
0
26 May 2025
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang
Pinxin Liu
Mingqian Feng
Mingqian Feng
Rui Mao
...
Hang Hua
Ali Vosoughi
Luchuan Song
Zeliang Zhang
Chenliang Xu
LRM
324
3
0
26 May 2025
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
Junhao Chen
Mingjin Chen
Jianjin Xu
Xiang Li
Junting Dong
...
Hongxiang Li
Yuhang Yang
Hao Zhao
Xiaoxiao Long
Ruqi Huang
DiffMVGen
223
5
0
23 May 2025
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
Sushant Gautam
Cise Midoglu
Vajira Thambawita
Michael A. Riegler
Pål Halvorsen
Mubarak Shah
166
1
0
22 May 2025
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Zheng Zhang
Zhen Sun
Zhenru Zhang
Zifan Peng
Yuemeng Zhao
Liang Luo
Zeren Luo
Ruiting Zuo
Xinlei He
175
2
0
07 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Wanrong Zhu
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
499
0
0
02 May 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
246
2
0
16 Apr 2025
LangPert: Detecting and Handling Task-level Perturbations for Robust Object Rearrangement
LangPert: Detecting and Handling Task-level Perturbations for Robust Object Rearrangement
Xu Yin
Min-Sung Yoon
Yuchi Huo
Kang Zhang
Sung-eui Yoon
133
0
0
14 Apr 2025
VideoAds for Fast-Paced Video Understanding
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang
Monica Dou
Linkai Peng
Hongyi Pan
Ulas Bagci
Boqing Gong
VLM
221
1
0
12 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
194
8
0
07 Apr 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Video-Bench: Human-Aligned Video Generation BenchmarkComputer Vision and Pattern Recognition (CVPR), 2025
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
You Li
Jing Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVMVGen
500
7
0
07 Apr 2025
Urban Computing in the Era of Large Language Models
Urban Computing in the Era of Large Language ModelsACM Transactions on Intelligent Systems and Technology (TIST), 2025
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
Chenyu Huang
428
3
0
02 Apr 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
Yangqiu Song
Zonghao Guo
Yibing Wang
Tianshuo Peng
Jian Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TSSyDaLRM
477
209
0
27 Mar 2025
Impossible Videos
Impossible Videos
Zechen Bai
Hai Ci
Mike Zheng Shou
EGVMVGen
274
6
0
18 Mar 2025
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Yiqi Zhu
Zihan Wang
Chen Zhang
Ziwei Sun
Yang Liu
CoGeVLM
209
2
0
18 Mar 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo
Ziyang Chen
Shaoguang Wang
Jianxiang He
Yijie Xu
Jinhui Ye
Ying Sun
Hui Xiong
279
15
0
17 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
Ali Vosoughi
Chong Chen
Chenliang Xu
LRM
186
15
0
14 Mar 2025
123
Next