ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.05328
  4. Cited By
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
v1v2 (latest)

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

5 June 2025
Lidong Lu
Guo Chen
Ruoyao Xiao
Yicheng Liu
Tong Lu
    VLMLRM
ArXiv (abs)PDFHTMLHuggingFace (20 upvotes)

Papers citing "AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs"

6 / 6 papers shown
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei
Yifei Huang
Jilan Xu
Yuping He
Guo Chen
Fei Wu
Yu Qiao
Jiangmiao Pang
EgoVLRM
215
4
0
27 Oct 2025
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Xingrui Wang
Jiang Liu
Chao Huang
X. Yu
Ze Wang
Ximeng Sun
Jialian Wu
Alan Yuille
Emad Barsoum
Zicheng Liu
VLM
101
0
0
16 Oct 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Yunlong Tang
Jing Bi
Pinxin Liu
Zhenyu Pan
Mingqian Feng
...
Zeliang Zhang
Daiki Shimada
Han Liu
Jiebo Luo
Chenliang Xu
MLLMOffRLVLMLRM
744
8
0
06 Oct 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRLAI4TSLRM
230
2
0
28 Sep 2025
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Yogesh Kulkarni
Pooyan Fazli
OffRLLRM
284
4
0
05 Aug 2025
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Xinyu Fang
Junming Yang
Xiangyu Zhao
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MAVLM
725
358
0
16 Jul 2024
1