Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.15220
Cited By
v1
v2
v3 (latest)
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
18 June 2025
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github
Papers citing
"video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models"
8 / 8 papers shown
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao
Kele Shao
Bohan Yu
Weiqiang Wang
Jian Liu
Huan Wang
VLM
254
2
0
18 Nov 2025
An Empirical Study for Representations of Videos in Video Question Answering via MLLMs
Zhi Li
Yanan Wang
Hao Niu
Julio Vizcarra
Masato Taya
88
0
0
14 Oct 2025
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
Guangzhi Sun
Yixuan Li
Xiaodong Wu
Yudong Yang
Wei Li
Zejun Ma
Chao Zhang
87
1
0
13 Oct 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen
Yue Ding
Weihong Lin
Jingyun Hua
Linli Yao
...
Yuanxing Zhang
Qiang Liu
Pengfei Wan
Liang Wang
Tieniu Tan
252
2
0
12 Oct 2025
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
Zilong Zheng
VLM
118
0
0
30 Sep 2025
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Changli Tang
Qinfan Xiao
Ke Mei
Tianyi Wang
Fengyun Rao
Chao Zhang
114
0
0
26 Sep 2025
Qwen3-Omni Technical Report
Jin Xu
Zhifang Guo
Hangrui Hu
Yunfei Chu
Xiong Wang
...
Bowen Yu
Jianxin Yang
Le Yu
Jingren Zhou
Junyang Lin
AuLLM
VGen
VLM
208
59
0
22 Sep 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
Yuying Ge
Yixiao Ge
Chen Li
Teng Wang
Junfu Pu
...
Xiaojing Zhang
Yangyu Tao
Han Hu
Di Wang
Mingyu Ding
152
13
0
28 Jul 2025
1