ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.01342
  4. Cited By
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
v1v2 (latest)

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

5 January 2019
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
Liat Kaver
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
ArXiv (abs)PDFHTML

Papers citing "AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection"

39 / 89 papers shown
End-to-End Active Speaker Detection
End-to-End Active Speaker DetectionEuropean Conference on Computer Vision (ECCV), 2022
Juan Carlos León Alcázar
M. Cordes
Chen Zhao
Guohao Li
288
37
0
27 Mar 2022
Audio visual character profiles for detecting background characters in
  entertainment media
Audio visual character profiles for detecting background characters in entertainment media
Rahul Sharma
Shrikanth Narayanan
140
5
0
21 Mar 2022
Visually Supervised Speaker Detection and Localization via Microphone
  Array
Visually Supervised Speaker Detection and Localization via Microphone ArrayIEEE International Workshop on Multimedia Signal Processing (MMSP), 2021
Davide Berghi
A. Hilton
Philip J. B. Jackson
201
11
0
07 Mar 2022
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker
  Detection and Speech Enhancement
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech EnhancementIEEE transactions on multimedia (IEEE TMM), 2022
Jun Xiong
Can Ma
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
199
37
0
04 Mar 2022
Data standardization for robust lip sync
Data standardization for robust lip syncIEEE International Conference on Multimedia and Expo (ICME), 2022
C. Wang
259
0
0
13 Feb 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Egocentric Deep Multi-Channel Audio-Visual Active Speaker LocalizationComputer Vision and Pattern Recognition (CVPR), 2022
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
239
47
0
06 Jan 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection
Learning Spatial-Temporal Graphs for Active Speaker Detection
Sourya Roy
Kyle Min
Subarna Tripathi
T. Guha
Somdeb Majumdar
202
3
0
02 Dec 2021
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
AVA-AVD: Audio-Visual Speaker Diarization in the WildACM Multimedia (MM), 2021
Eric Z. Xu
Zeyang Song
Satoshi Tsutsui
C. Feng
Mang Ye
Mike Zheng Shou
VGen
432
55
0
29 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Structure from Silence: Learning Scene Structure from Ambient SoundConference on Robot Learning (CoRL), 2021
Ziyang Chen
Xixi Hu
Andrew Owens
192
31
0
10 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source
  Localization on Multi-face Videos
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
146
5
0
05 Nov 2021
A trained humanoid robot can perform human-like crossmodal social
  attention and conflict resolution
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolutionInternational Journal of Social Robotics (JSR), 2021
Di Fu
Fares Abawi
Hugo C. C. Carneiro
Matthias Kerzel
Ziwei Chen
Erik Strahl
Xun Liu
S. Wermter
442
10
0
02 Nov 2021
Sub-word Level Lip Reading With Visual Attention
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
240
112
0
14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
1.0K
1,486
0
13 Oct 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
FaVoA: Face-Voice Association Favours Ambiguous Speaker DetectionInternational Conference on Artificial Neural Networks (ICANN), 2021
Hugo C. C. Carneiro
C. Weber
S. Wermter
CVBM
336
9
0
01 Sep 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
222
27
0
17 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
The Right to Talk: An Audio-Visual Transformer ApproachIEEE International Conference on Computer Vision (ICCV), 2021
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
223
38
0
06 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
UniCon: Unified Context Network for Robust Active Speaker DetectionACM Multimedia (ACM MM), 2021
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
164
43
0
05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for
  Audio-visual Active Speaker Detection
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker DetectionACM Multimedia (ACM MM), 2021
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
208
218
0
14 Jul 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker
  Detection in the Wild
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the WildIEEE International Conference on Computer Vision (ICCV), 2021
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
251
57
0
07 Jun 2021
Active Speaker Detection as a Multi-Objective Optimization with
  Uncertainty-based Multimodal Fusion
Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal FusionInterspeech (Interspeech), 2021
Baptiste Pouthier
L. Pilati
Leela K. Gudupudi
C. Bouveyron
F. Precioso
183
12
0
07 Jun 2021
APES: Audiovisual Person Search in Untrimmed Video
APES: Audiovisual Person Search in Untrimmed Video
Juan Carlos León Alcázar
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbeláez
Guohao Li
Fabian Caba Heilbron
132
6
0
03 Jun 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2021
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
349
67
0
26 Jan 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker DetectionIEEE International Conference on Computer Vision (ICCV), 2021
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
354
63
0
11 Jan 2021
Large-scale multilingual audio visual dubbing
Large-scale multilingual audio visual dubbing
Yi Yang
Brendan Shillingford
Yannis Assael
Miaosen Wang
Wendi Liu
...
Eren Sezener
Luis C. Cobo
Misha Denil
Y. Aytar
Nando de Freitas
154
25
0
06 Nov 2020
Muse: Multi-modal target speaker extraction with visual cues
Muse: Multi-modal target speaker extraction with visual cues
Zexu Pan
Ruijie Tao
Chenglin Xu
Haizhou Li
313
64
0
15 Oct 2020
HAA500: Human-Centric Atomic Action Dataset with Curated Videos
HAA500: Human-Centric Atomic Action Dataset with Curated VideosIEEE International Conference on Computer Vision (ICCV), 2020
Jihoon Chung
Cheng-hsin Wuu
Hsuan-ru Yang
Yu-Wing Tai
Chi-Keung Tang
219
59
0
11 Sep 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from VideoEuropean Conference on Computer Vision (ECCV), 2020
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
243
278
0
10 Aug 2020
A Unified Framework for Shot Type Classification Based on Subject
  Centric Lens
A Unified Framework for Shot Type Classification Based on Subject Centric LensEuropean Conference on Computer Vision (ECCV), 2020
Anyi Rao
Jiaze Wang
Linning Xu
Xuekun Jiang
Qingqiu Huang
Bolei Zhou
Dahua Lin
227
78
0
08 Aug 2020
Online Multi-modal Person Search in Videos
Online Multi-modal Person Search in VideosEuropean Conference on Computer Vision (ECCV), 2020
J. Xia
Anyi Rao
Qingqiu Huang
Linning Xu
Jiangtao Wen
Dahua Lin
204
29
0
08 Aug 2020
MovieNet: A Holistic Dataset for Movie Understanding
MovieNet: A Holistic Dataset for Movie Understanding
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
338
285
0
21 Jul 2020
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
AI4TS
200
127
0
27 Jun 2020
Rescaling Egocentric Vision
Rescaling Egocentric VisionInternational Journal of Computer Vision (IJCV), 2020
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
518
586
0
23 Jun 2020
Active Speakers in Context
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
134
73
0
20 May 2020
A Local-to-Global Approach to Multi-modal Movie Scene Segmentation
A Local-to-Global Approach to Multi-modal Movie Scene SegmentationComputer Vision and Pattern Recognition (CVPR), 2020
Anyi Rao
Linning Xu
Yu Xiong
Guodong Xu
Qingqiu Huang
Bolei Zhou
Dahua Lin
210
127
0
06 Apr 2020
Cross modal video representations for weakly supervised active speaker
  localization
Cross modal video representations for weakly supervised active speaker localizationIEEE transactions on multimedia (TMM), 2020
Rahul Sharma
Krishna Somandepalli
Shrikanth Narayanan
183
9
0
09 Mar 2020
Bio-Inspired Modality Fusion for Active Speaker Detection
Bio-Inspired Modality Fusion for Active Speaker DetectionApplied Sciences (Appl. Sci.), 2020
Gustavo Assunção
Nuno Gonccalves
Paulo Menezes
142
3
0
28 Feb 2020
Self-supervised learning for audio-visual speaker diarization
Self-supervised learning for audio-visual speaker diarizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yifan Ding
Yong-mei Xu
Shi-Xiong Zhang
Yahuan Cong
Liqiang Wang
VLM
143
35
0
13 Feb 2020
Multimodal active speaker detection and virtual cinematography for video
  conferencing
Multimodal active speaker detection and virtual cinematography for video conferencingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Ross Cutler
Ramin Mehran
Sam Johnson
Cha Zhang
Adam G. Kirk
Oliver Whyte
Adarsh Kowdle
189
9
0
10 Feb 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A SurveyInternational Journal of Automation and Computing (IJAC), 2020
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
223
178
0
14 Jan 2020
Previous
12
Page 2 of 2