Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2506.15220
Cited By

video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models

v1v2v3 (latest)

video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models

18 June 2025

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github

Papers citing "video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models"

8 / 8 papers shown

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

254

2

0

18 Nov 2025

An Empirical Study for Representations of Videos in Video Question Answering via MLLMs

An Empirical Study for Representations of Videos in Video Question Answering via MLLMs

88

0

0

14 Oct 2025

video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory

video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory

87

1

0

13 Oct 2025

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

...

252

2

0

12 Oct 2025

V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs

V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs

118

0

0

30 Sep 2025

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

114

0

0

26 Sep 2025

Qwen3-Omni Technical Report

Qwen3-Omni Technical Report

...

208

59

0

22 Sep 2025

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

...

152

13

0

28 Jul 2025