v1v2 (latest)

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

5 January 2019

Arkadiusz Stopczynski

Papers citing "AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection"

50 / 88 papers shown

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

...

194

01 Dec 2025

Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation

Scott Merrill

Shashank Srivastava

21 Nov 2025

Ensembling Synchronisation-based and Face-Voice Association Paradigms for Robust Active Speaker Detection in Egocentric Recordings

Jason Clarke

Yoshihiko Gotoh

Stefan Goetze

14 Aug 2025

Multimodal Conversation Structure Understanding

318

23 May 2025

Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Yi He

Lei Yang

Shilin Wang

331

05 Mar 2025

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen

Xiaohua Xie

Yong Jae Lee

362

21 Jan 2025

Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies

221

28 Oct 2024

Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization

205

15 Oct 2024

An Efficient and Streaming Audio Visual Active Speaker Detection SystemIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Arnav Kundu

Yanzi Jin

Mohammad Hossein Sekhavat

Max Horton

Danny Tormoen

Devang Naik

194

13 Sep 2024

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

Victoria Mingote

Alfonso Ortega

A. Miguel

Eduardo Lleida

248

09 Sep 2024

Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024

Heeseung Yun

Ruohan Gao

Ishwarya Ananthabhotla

Gunhee Kim

211

09 Aug 2024

Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task

Bosong Ding

M. Kirtay

Giacomo Spigler

316

16 Jul 2024

Audio-Visual Talker Localization in Video for Spatial Sound Reproduction

Davide Berghi

Philip J. B. Jackson

209

01 Jun 2024

Robust Active Speaker Detection in Noisy Environments

Siva Sai Nagender Vasireddy

Chenxu Zhang

Xiaohu Guo

Yapeng Tian

378

27 Mar 2024

REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild

208

02 Mar 2024

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

José-M. Acosta-Triana

David Gimeno-Gómez

Carlos David Martínez Hinarejos

VLM VGen

286

20 Feb 2024

Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

Davide Berghi

Philip J. B. Jackson

217

21 Dec 2023

Audio-visual child-adult speaker classification in dyadic interactionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shrikanth Narayanan

282

03 Oct 2023

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Joon Son Chung

173

21 Sep 2023

A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying MechanismIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

I. Gurvich

Ido Leichter

Dharmendar Reddy Palle

197

15 Sep 2023

AdVerb: Visually Guided Audio DereverberationIEEE International Conference on Computer Vision (ICCV), 2023

202

23 Aug 2023

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head AvatarsNeural Information Processing Systems (NeurIPS), 2023

...

258

22 May 2023

Target Active Speaker Detection with Audio-visual CuesInterspeech (Interspeech), 2023

Yiding Jiang

Ruijie Tao

Zexu Pan

Haizhou Li

359

22 May 2023

Listen to Look into the Future: Audio-Visual Egocentric Gaze AnticipationEuropean Conference on Computer Vision (ECCV), 2023

Bolin Lai

Fiona Ryan

Wenqi Jia

Miao Liu

James M. Rehg

EgoV

371

06 May 2023

A multimodal dynamical variational autoencoder for audiovisual speech representation learningNeural Networks (NN), 2022

Samir Sadok

Simon Leglaive

Laurent Girin

Xavier Alameda-Pineda

Renaud Séguier

356

05 May 2023

Word-level Persian Lipreading DatasetInternational Conference on Computer and Knowledge Engineering (ICCKE), 2022

154

08 Apr 2023

Egocentric Auditory Attention Localization in ConversationsComputer Vision and Pattern Recognition (CVPR), 2023

224

28 Mar 2023

WASD: A Wilder Active Speaker Detection DatasetIEEE Transactions on Biometrics Behavior and Identity Science (TBBIS), 2023

177

09 Mar 2023

A Light Weight Model for Active Speaker DetectionComputer Vision and Pattern Recognition (CVPR), 2023

208

08 Mar 2023

A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV DatasetExpert systems with applications (ESWA), 2023

Mohammad Reza Mohammadi

N. Mozayani

278

21 Jan 2023

Novel-View Acoustic SynthesisComputer Vision and Pattern Recognition (CVPR), 2023

Natalia Neverova

Andrea Vedaldi

213

20 Jan 2023

LoCoNet: Long-Short Context Network for Active Speaker DetectionComputer Vision and Pattern Recognition (CVPR), 2023

Xizi Wang

Feng Cheng

Gedas Bertasius

David J. Crandall

236

19 Jan 2023

Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker DetectionIEEE Open Journal of Signal Processing (JOSP), 2022

Rahul Sharma

Shrikanth Narayanan

208

01 Dec 2022

Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge

Hugo C. C. Carneiro

C. Weber

S. Wermter

532

23 Nov 2022

Late Audio-Visual Fusion for In-The-Wild Speaker Diarization

Zexu Pan

Gordon Wichern

François Germain

Aswin Shanmugam Subramanian

Jonathan Le Roux

VGen

312

02 Nov 2022

No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration

Jose Vargas-Quiros

Laura Cabrera-Quiros

Hayley Hung

233

01 Nov 2022

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

Kyle Min

VLM

261

14 Oct 2022

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker DetectionSpoken Language Technology Workshop (SLT), 2022

Haibin Wu

249

03 Oct 2022

Unsupervised active speaker detection in media content using cross-modal information

Rahul Sharma

Shrikanth Narayanan

265

24 Sep 2022

MIntRec: A New Dataset for Multimodal Intent RecognitionACM Multimedia (ACM MM), 2022

265

09 Sep 2022

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

292

20 Aug 2022

End-To-End Audiovisual Feature Fusion for Active Speaker DetectionInternational Conference on Digital Image Processing (ICDIP), 2022

154

27 Jul 2022

Learning Long-Term Spatial-Temporal Graphs for Active Speaker DetectionEuropean Conference on Computer Vision (ECCV), 2022

252

15 Jul 2022

Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022

Chuang Gan

Antonio Torralba

268

07 Jul 2022

UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

226

22 Jun 2022

Rethinking Audio-visual Synchronization for Active Speaker DetectionInternational Workshop on Machine Learning for Signal Processing (MLSP), 2022

Abudukelimu Wuerkaixi

You Zhang

Z. Duan

Changshui Zhang

176

21 Jun 2022

Self-Supervised Learning for Videos: A SurveyACM Computing Surveys (ACM CSUR), 2022

Madeline Chantry Schiappa

Yogesh S Rawat

M. Shah

SSL

474

166

18 Jun 2022

End-to-end multi-talker audio-visual ASR using an active speaker attention moduleInterspeech (Interspeech), 2022

R. Rose

Olivier Siohan

170

01 Apr 2022

Using Active Speaker Faces for Diarization in TV shows

Rahul Sharma

Shrikanth Narayanan

CVBM

166

30 Mar 2022

End-to-End Active Speaker DetectionEuropean Conference on Computer Vision (ECCV), 2022

Juan Carlos León Alcázar

M. Cordes

Chen Zhao

Guohao Li

279

27 Mar 2022