Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2410.06682
Cited By
v1
v2 (latest)
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
9 October 2024
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Tianhao Shen
Chao Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization"
7 / 7 papers shown
Title
FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
Siddhant Sukhani
Yash Bhardwaj
Riya Bhadani
Veer Kejriwal
Michael Galarnyk
Sudheer Chava
60
0
0
30 Sep 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
376
5
0
29 Mar 2025
Improving LLM Video Understanding with 16 Frames Per Second
Yongqian Li
Changli Tang
Jimin Zhuang
Yudong Yang
Guangzhi Sun
W. Li
Tianhao Shen
Chao Zhang
VLM
337
10
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yujiao Shi
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
753
12
0
18 Mar 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Yongqian Li
W. Li
Tianhao Shen
Chao Zhang
LRM
MLLM
VLM
258
13
0
17 Feb 2025
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
333
38
0
02 Feb 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
623
155
0
29 Dec 2023
1