ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.03932
  4. Cited By
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker
  Detection in the Wild

How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild

7 June 2021
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
    3DV
ArXivPDFHTML

Papers citing "How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild"

10 / 10 papers shown
Title
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
24
16
0
28 Mar 2023
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using
  Permutation-Free Loss Function
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
14
4
0
26 Oct 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at
  ActivityNet Challenge 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Yuanhang Zhang
Susan Liang
Shuang Yang
Shiguang Shan
8
4
0
22 Jun 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection
Rethinking Audio-visual Synchronization for Active Speaker Detection
Abudukelimu Wuerkaixi
You Zhang
Z. Duan
Changshui Zhang
16
10
0
21 Jun 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
25
40
0
06 Jan 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection
Learning Spatial-Temporal Graphs for Active Speaker Detection
Sourya Roy
Kyle Min
Subarna Tripathi
T. Guha
Somdeb Majumdar
25
3
0
02 Dec 2021
A trained humanoid robot can perform human-like crossmodal social
  attention and conflict resolution
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolution
Di Fu
Fares Abawi
Hugo C. C. Carneiro
Matthias Kerzel
Ziwei Chen
Erik Strahl
Xun Liu
S. Wermter
14
6
0
02 Nov 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Bernard Ghanem
57
51
0
11 Jan 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source
  Separation
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Daniel Stoller
Sebastian Ewert
S. Dixon
AI4TS
101
588
0
08 Jun 2018
1