$Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement$

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

4 March 2022

Lei Xie

Wei Huang

Papers citing "Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement"

13 / 13 papers shown

Title
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection Andrea Appiani Cigdem Beyan CLIP VLM 28 0 0 18 Oct 2024
Robust Active Speaker Detection in Noisy Environments Siva Sai Nagender Vasireddy Chenxu Zhang Xiaohu Guo Yapeng Tian 27 0 0 27 Mar 2024
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech Jun Yu Li Ruijie Tao Zexu Pan Meng Ge Shuai Wang Haizhou Li 13 5 0 15 Sep 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation Bolin Lai Fiona Ryan Wenqi Jia Miao Liu James M. Rehg EgoV 19 8 0 06 May 2023
Egocentric Auditory Attention Localization in Conversations Fiona Ryan Hao Jiang Abhinav Shukla James M. Rehg V. Ithapu EgoV 24 16 0 28 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective Jun Xiong Gang Wang Peng Zhang Wei Huang Yufei Zha Guangtao Zhai 11 14 0 11 Mar 2023
A Light Weight Model for Active Speaker Detection Junhua Liao Haihan Duan Kanghui Feng Wanbing Zhao Yanbing Yang Liangyin Chen 24 36 0 08 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection Xizi Wang Feng Cheng Gedas Bertasius David J. Crandall 19 14 0 19 Jan 2023
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning Suzhe Wang Lincheng Li Yueqing Ding Xin Yu CVBM 59 117 0 06 Dec 2021
The Right to Talk: An Audio-Visual Transformer Approach Thanh-Dat Truong C. Duong T. D. Vu H. Pham Bhiksha Raj Ngan Le Khoa Luu 55 36 0 06 Aug 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Bernard Ghanem 57 51 0 11 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 185 198 0 08 Jan 2021
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 214 2,224 0 14 Jun 2018