ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.15220
  4. Cited By
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
v1v2v3 (latest)

video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models

18 June 2025
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github

Papers citing "video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models"

8 / 8 papers shown
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao
Kele Shao
Bohan Yu
Weiqiang Wang
Jian Liu
Huan Wang
VLM
254
2
0
18 Nov 2025
An Empirical Study for Representations of Videos in Video Question Answering via MLLMs
An Empirical Study for Representations of Videos in Video Question Answering via MLLMs
Zhi Li
Yanan Wang
Hao Niu
Julio Vizcarra
Masato Taya
88
0
0
14 Oct 2025
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
Guangzhi Sun
Yixuan Li
Xiaodong Wu
Yudong Yang
Wei Li
Zejun Ma
Chao Zhang
87
1
0
13 Oct 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen
Yue Ding
Weihong Lin
Jingyun Hua
Linli Yao
...
Yuanxing Zhang
Qiang Liu
Pengfei Wan
Liang Wang
Tieniu Tan
252
2
0
12 Oct 2025
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
Zilong Zheng
VLM
118
0
0
30 Sep 2025
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Changli Tang
Qinfan Xiao
Ke Mei
Tianyi Wang
Fengyun Rao
Chao Zhang
114
0
0
26 Sep 2025
Qwen3-Omni Technical Report
Qwen3-Omni Technical Report
Jin Xu
Zhifang Guo
Hangrui Hu
Yunfei Chu
Xiong Wang
...
Bowen Yu
Jianxin Yang
Le Yu
Jingren Zhou
Junyang Lin
AuLLMVGenVLM
208
59
0
22 Sep 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
Yuying Ge
Yixiao Ge
Chen Li
Teng Wang
Junfu Pu
...
Xiaojing Zhang
Yangyu Tao
Han Hu
Di Wang
Mingyu Ding
152
13
0
28 Jul 2025
1