ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.01342
  4. Cited By
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
v1v2 (latest)

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

5 January 2019
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
Liat Kaver
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
ArXiv (abs)PDFHTML

Papers citing "AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection"

50 / 88 papers shown
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen
Zhuoran Yu
Samuel Low Yu Hang
Subin An
J. Lee
...
SeungEun Chung
Thanh-Huy Nguyen
JuWan Maeng
Soochahn Lee
Yong Jae Lee
AuLLMVLM
194
0
0
01 Dec 2025
Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation
Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation
Scott Merrill
Shashank Srivastava
88
0
0
21 Nov 2025
Ensembling Synchronisation-based and Face-Voice Association Paradigms for Robust Active Speaker Detection in Egocentric Recordings
Ensembling Synchronisation-based and Face-Voice Association Paradigms for Robust Active Speaker Detection in Egocentric Recordings
Jason Clarke
Yoshihiko Gotoh
Stefan Goetze
92
0
0
14 Aug 2025
Multimodal Conversation Structure Understanding
Multimodal Conversation Structure Understanding
Kent K. Chang
Mackenzie Cramer
Anna Ho
Ti Ti Nguyen
Yilin Yuan
David Bamman
318
1
0
23 May 2025
Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Yi He
Lei Yang
Shilin Wang
331
0
0
05 Mar 2025
LASER: Lip Landmark Assisted Speaker Detection for Robustness
LASER: Lip Landmark Assisted Speaker Detection for Robustness
Le Thien Phuc Nguyen
Xiaohua Xie
Yong Jae Lee
362
3
0
21 Jan 2025
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input
  Dependencies
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies
Xiwen Li
Rehman Mohammed
Tristalee Mangin
Surojit Saha
Ross T. Whitaker
Kerry E Kelly
Tolga Tasdizen
221
6
0
28 Oct 2024
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Mao-Kui He
Jun Du
Shu-Tong Niu
Qing-Feng Liu
Chin-Hui Lee
205
2
0
15 Oct 2024
An Efficient and Streaming Audio Visual Active Speaker Detection System
An Efficient and Streaming Audio Visual Active Speaker Detection SystemIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Arnav Kundu
Yanzi Jin
Mohammad Hossein Sekhavat
Max Horton
Danny Tormoen
Devang Naik
194
0
0
13 Sep 2024
Audio-Visual Speaker Diarization: Current Databases, Approaches and
  Challenges
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Victoria Mingote
Alfonso Ortega
A. Miguel
Eduardo Lleida
248
3
0
09 Sep 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric
  Videos
Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
211
6
0
09 Aug 2024
Imitation of human motion achieves natural head movements for humanoid
  robots in an active-speaker detection task
Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task
Bosong Ding
M. Kirtay
Giacomo Spigler
316
0
0
16 Jul 2024
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
209
1
0
01 Jun 2024
Robust Active Speaker Detection in Noisy Environments
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
378
1
0
27 Mar 2024
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from
  Multimodal Body Movement Signals in the Wild
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild
Jose Vargas-Quiros
Chirag Raman
Stephanie Tan
Ekin Gedik
Laura Cabrera-Quiros
Hayley Hung
208
3
0
02 Mar 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech
  Technologies
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
José-M. Acosta-Triana
David Gimeno-Gómez
Carlos David Martínez Hinarejos
VLMVGen
286
4
0
20 Feb 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection
  and Localization
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
217
5
0
21 Dec 2023
Audio-visual child-adult speaker classification in dyadic interactions
Audio-visual child-adult speaker classification in dyadic interactionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Anfeng Xu
Kevin Huang
Tiantian Feng
Helen Tager-Flusberg
Shrikanth Narayanan
282
4
0
03 Oct 2023
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive
  Learning
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chaeyoung Jung
Suyeon Lee
KiHyun Nam
Kyeongha Rho
You Jin Kim
Youngjoon Jang
Joon Son Chung
173
15
0
21 Sep 2023
A Real-Time Active Speaker Detection System Integrating an Audio-Visual
  Signal with a Spatial Querying Mechanism
A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying MechanismIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
I. Gurvich
Ido Leichter
Dharmendar Reddy Palle
Yossi Asher
Alon Vinnikov
Igor Abramovski
Vishak Gopal
Ross Cutler
Eyal Krupka
197
4
0
15 Sep 2023
AdVerb: Visually Guided Audio Dereverberation
AdVerb: Visually Guided Audio DereverberationIEEE International Conference on Computer Vision (ICCV), 2023
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
202
18
0
23 Aug 2023
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards
  High-fidelity Head Avatars
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head AvatarsNeural Information Processing Systems (NeurIPS), 2023
Dongwei Pan
Long Zhuo
Jingtan Piao
Huiwen Luo
Wei Cheng
...
Chen Change Loy
Chao Qian
Wayne Wu
Dahua Lin
Kwan-Yee Lin
258
39
0
22 May 2023
Target Active Speaker Detection with Audio-visual Cues
Target Active Speaker Detection with Audio-visual CuesInterspeech (Interspeech), 2023
Yiding Jiang
Ruijie Tao
Zexu Pan
Haizhou Li
359
23
0
22 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze AnticipationEuropean Conference on Computer Vision (ECCV), 2023
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
371
17
0
06 May 2023
A multimodal dynamical variational autoencoder for audiovisual speech
  representation learning
A multimodal dynamical variational autoencoder for audiovisual speech representation learningNeural Networks (NN), 2022
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
356
20
0
05 May 2023
Word-level Persian Lipreading Dataset
Word-level Persian Lipreading DatasetInternational Conference on Computer and Knowledge Engineering (ICCKE), 2022
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
154
7
0
08 Apr 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in ConversationsComputer Vision and Pattern Recognition (CVPR), 2023
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
224
23
0
28 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
WASD: A Wilder Active Speaker Detection DatasetIEEE Transactions on Biometrics Behavior and Identity Science (TBBIS), 2023
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
177
5
0
09 Mar 2023
A Light Weight Model for Active Speaker Detection
A Light Weight Model for Active Speaker DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
208
62
0
08 Mar 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech
  Recognition: the Arman-AV Dataset
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV DatasetExpert systems with applications (ESWA), 2023
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
278
14
0
21 Jan 2023
Novel-View Acoustic Synthesis
Novel-View Acoustic SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Changan Chen
Alexander Richard
Roman Shapovalov
V. Ithapu
Natalia Neverova
Kristen Grauman
Andrea Vedaldi
213
45
0
20 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
LoCoNet: Long-Short Context Network for Active Speaker DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
236
28
0
19 Jan 2023
Audio-Visual Activity Guided Cross-Modal Identity Association for Active
  Speaker Detection
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker DetectionIEEE Open Journal of Signal Processing (JOSP), 2022
Rahul Sharma
Shrikanth Narayanan
208
11
0
01 Dec 2022
Whose Emotion Matters? Speaking Activity Localisation without Prior
  Knowledge
Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge
Hugo C. C. Carneiro
C. Weber
S. Wermter
532
6
0
23 Nov 2022
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization
Zexu Pan
Gordon Wichern
François Germain
Aswin Shanmugam Subramanian
Jonathan Le Roux
VGen
312
2
0
02 Nov 2022
No-audio speaking status detection in crowded settings via visual
  pose-based filtering and wearable acceleration
No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Jose Vargas-Quiros
Laura Cabrera-Quiros
Hayley Hung
233
3
0
01 Nov 2022
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual
  Diarization
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization
Kyle Min
VLM
261
15
0
14 Oct 2022
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
  Active Speaker Detection
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker DetectionSpoken Language Technology Workshop (SLT), 2022
Xuan-Bo Chen
Haibin Wu
Helen Meng
Hung-yi Lee
J. Jang
AAML
249
5
0
03 Oct 2022
Unsupervised active speaker detection in media content using cross-modal
  information
Unsupervised active speaker detection in media content using cross-modal information
Rahul Sharma
Shrikanth Narayanan
265
3
0
24 Sep 2022
MIntRec: A New Dataset for Multimodal Intent Recognition
MIntRec: A New Dataset for Multimodal Intent RecognitionACM Multimedia (ACM MM), 2022
Hanlei Zhang
Huanlin Xu
Xin Eric Wang
Qianrui Zhou
Shaojie Zhao
Jiayan Teng
265
65
0
09 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
292
69
0
20 Aug 2022
End-To-End Audiovisual Feature Fusion for Active Speaker Detection
End-To-End Audiovisual Feature Fusion for Active Speaker DetectionInternational Conference on Digital Image Processing (ICDIP), 2022
Fiseha B. Tesema
Zheyuan Lin
Shiqiang Zhu
Wei Song
J. Gu
Hong-Chuan Wu
154
4
0
27 Jul 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Learning Long-Term Spatial-Temporal Graphs for Active Speaker DetectionEuropean Conference on Computer Vision (ECCV), 2022
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
252
53
0
15 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
268
20
0
07 Jul 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at
  ActivityNet Challenge 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Yuanhang Zhang
Susan Liang
Shuang Yang
Shiguang Shan
226
4
0
22 Jun 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection
Rethinking Audio-visual Synchronization for Active Speaker DetectionInternational Workshop on Machine Learning for Signal Processing (MLSP), 2022
Abudukelimu Wuerkaixi
You Zhang
Z. Duan
Changshui Zhang
176
20
0
21 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A SurveyACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
474
166
0
18 Jun 2022
End-to-end multi-talker audio-visual ASR using an active speaker
  attention module
End-to-end multi-talker audio-visual ASR using an active speaker attention moduleInterspeech (Interspeech), 2022
R. Rose
Olivier Siohan
170
4
0
01 Apr 2022
Using Active Speaker Faces for Diarization in TV shows
Using Active Speaker Faces for Diarization in TV shows
Rahul Sharma
Shrikanth Narayanan
CVBM
166
10
0
30 Mar 2022
End-to-End Active Speaker Detection
End-to-End Active Speaker DetectionEuropean Conference on Computer Vision (ECCV), 2022
Juan Carlos León Alcázar
M. Cordes
Chen Zhao
Guohao Li
279
36
0
27 Mar 2022
12
Next