VGGSounder: Audio-Visual Evaluations for Foundation Models

v1v2v3 (latest)

VGGSounder: Audio-Visual Evaluations for Foundation Models

11 August 2025

Thaddäus Wiedemer

Christian Schroeder de Witt

Matthias Bethge

Wieland Brendel

A. Sophia Koepke

ArXiv (abs)PDF HTML

Papers citing "VGGSounder: Audio-Visual Evaluations for Foundation Models"

3 / 3 papers shown

Title
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Le Thien Phuc Nguyen Zhuoran Yu Samuel Low Yu Hang Subin An J. Lee ... SeungEun Chung Thanh-Huy Nguyen JuWan Maeng Soochahn Lee Yong Jae Lee AuLLM VLM 149 0 0 01 Dec 2025
Solving Spatial Supersensing Without Spatial Supersensing Vishaal Udandarao Shyamgopal Karthik Surabhi S. Nath Andreas Hochlehnert Matthias Bethge Ameya Prabhu 61 0 0 20 Nov 2025
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding Haomiao Chen K. Jamison M. Sabuncu Amy Kuceyeski 112 0 0 07 Oct 2025