LoCoNet: Long-Short Context Network for Active Speaker Detection

LoCoNet: Long-Short Context Network for Active Speaker Detection

19 January 2023

Gedas Bertasius

David J. Crandall

Papers citing "LoCoNet: Long-Short Context Network for Active Speaker Detection"

10 / 10 papers shown

Title
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization Detao Bai Zhiheng Ma Xihan Wei Liefeng Bo 55 0 0 06 May 2025
TimeRefine: Temporal Grounding with Time Refining Video LLM Xizi Wang Feng Cheng Ziyang Wang Huiyu Wang Md. Mohaiminul Islam Lorenzo Torresani Mohit Bansal Gedas Bertasius David J. Crandall 109 1 0 12 Dec 2024
A Light Weight Model for Active Speaker Detection Junhua Liao Haihan Duan Kanghui Feng Wanbing Zhao Yanbing Yang Liangyin Chen 24 35 0 08 Mar 2023
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,017 0 13 Oct 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 278 1,939 0 09 Feb 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Bernard Ghanem 57 51 0 11 Jan 2021
Self-supervised learning for audio-visual speaker diarization Yifan Ding Yong-mei Xu Shi-Xiong Zhang Yahuan Cong Liqiang Wang VLM 34 29 0 13 Feb 2020
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 261 10,106 0 16 Nov 2016
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 162 782 0 16 Nov 2016
Xception: Deep Learning with Depthwise Separable Convolutions François Chollet MDE BDL PINN 201 14,190 0 07 Oct 2016