Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.06682
Cited By
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
9 October 2024
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Z. Ma
Chao Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization"
4 / 4 papers shown
Title
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
31
0
0
29 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
69
2
0
18 Mar 2025
Improving LLM Video Understanding with 16 Frames Per Second
Y. Li
Changli Tang
Jimin Zhuang
Yudong Yang
Guangzhi Sun
W. Li
Z. Ma
Chao Zhang
VLM
72
1
0
18 Mar 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
2
0
17 Feb 2025
1