ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.06682
  4. Cited By
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning
  using Multi-Round Preference Optimization

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

9 October 2024
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Z. Ma
Chao Zhang
ArXivPDFHTML

Papers citing "Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization"

4 / 4 papers shown
Title
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
31
0
0
29 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
69
2
0
18 Mar 2025
Improving LLM Video Understanding with 16 Frames Per Second
Improving LLM Video Understanding with 16 Frames Per Second
Y. Li
Changli Tang
Jimin Zhuang
Yudong Yang
Guangzhi Sun
W. Li
Z. Ma
Chao Zhang
VLM
72
1
0
18 Mar 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
2
0
17 Feb 2025
1