Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1603.09725
Cited By
v1
v2 (latest)
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
31 March 2016
I. D. Gebru
Silèye O. Ba
Xiaofei Li
Radu Horaud
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion"
25 / 25 papers shown
Title
Alignment Helps Make the Most of Multimodal Data
Christian Arnold
Andreas Küpfer
129
2
0
14 May 2024
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Davide Berghi
Peipei Wu
Meng Cui
Jianyuan Sun
Philip J. B. Jackson
Wenwu Wang
BDL
132
7
0
23 Oct 2023
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
51
3
0
09 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
86
17
0
19 Jan 2023
Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research
Davide Berghi
M. Volino
Philip J. B. Jackson
VGen
50
6
0
04 Dec 2022
WebUAV-3M: A Benchmark for Unveiling the Power of Million-Scale Deep UAV Tracking
Chunhui Zhang
Guanjie Huang
Li Liu
Shan Huang
Yinan Yang
Xiang Wan
Shiming Ge
Dacheng Tao
157
24
0
19 Jan 2022
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Eric Z. Xu
Zeyang Song
Satoshi Tsutsui
C. Feng
Mang Ye
Mike Zheng Shou
VGen
83
43
0
29 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
428
1,115
0
13 Oct 2021
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
113
21
0
17 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
89
38
0
05 Aug 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
98
46
0
07 Jun 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
123
75
0
01 Jan 2021
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
126
256
0
10 Aug 2020
Self-supervised learning for audio-visual speaker diarization
Yifan Ding
Yong-mei Xu
Shi-Xiong Zhang
Yahuan Cong
Liqiang Wang
VLM
78
29
0
13 Feb 2020
Advances in Online Audio-Visual Meeting Transcription
Takuya Yoshioka
Igor Abramovski
Cem Aksoylar
Zhuo Chen
Moshe David
...
Huaming Wang
Zhenghao Wang
Jun Zhang
Yong Zhao
Tianyan Zhou
95
75
0
10 Dec 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
122
338
0
10 Nov 2019
The LOCATA Challenge: Acoustic Source Localization and Tracking
C. Evers
Heinrich W. Löllmann
H. Mellmann
Alexander Schmidt
Hendrik Barfuss
Patrick A. Naylor
Walter Kellermann
62
133
0
03 Sep 2019
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights
C. Schymura
D. Kolossa
50
7
0
14 Mar 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
92
145
0
05 Jan 2019
A cascaded multiple-speaker localization and tracking system
Xiaofei Li
Yutong Ban
Laurent Girin
Xavier Alameda-Pineda
Radu Horaud
BDL
44
2
0
11 Dec 2018
Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers
Yutong Ban
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
77
50
0
28 Sep 2018
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
Xiaofei Li
Yutong Ban
Laurent Girin
Xavier Alameda-Pineda
Radu Horaud
64
45
0
28 Sep 2018
Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction
Stéphane Lathuilière
Benoit Massé
Pablo Mesejo
Radu Horaud
47
32
0
18 Nov 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
184
2,967
0
26 May 2017
Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction
Benoit Massé
Silèye O. Ba
Radu Horaud
66
83
0
14 Mar 2017
1