Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2506.05328
Cited By

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

v1v2 (latest)

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

5 June 2025

ArXiv (abs)PDF HTML HuggingFace (20 upvotes)

Papers citing "AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs"

6 / 6 papers shown

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

215

4

0

27 Oct 2025

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

101

0

0

16 Oct 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

...

MLLM OffRL VLM LRM

744

8

0

06 Oct 2025

ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis

OffRL AI4TS LRM

230

2

0

28 Sep 2025

AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video

Yogesh Kulkarni

284

4

0

05 Aug 2025

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

...

725

358

0

16 Jul 2024