The Conversation: Deep Audio-Visual Speech Enhancement

11 April 2018

Joon Son Chung

Papers citing "The Conversation: Deep Audio-Visual Speech Enhancement"

50 / 107 papers shown

Title
Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features Yicheng Hsu Yonghan Lee M. Bai 22 10 0 10 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video Rishabh Garg Ruohan Gao Kristen Grauman 15 28 0 21 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 272 1,026 0 13 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 36 19 0 07 Oct 2021
USEV: Universal Speaker Extraction with Visual Cue Zexu Pan Meng Ge Haizhou Li 34 41 0 30 Sep 2021
Look Who's Talking: Active Speaker Detection in the Wild You Jin Kim Hee-Soo Heo Soyeon Choe Soo-Whan Chung Yoohwan Kwon Bong-Jin Lee Youngki Kwon Joon Son Chung 44 20 0 17 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach Thanh-Dat Truong C. Duong T. D. Vu H. Pham Bhiksha Raj Ngan Le Khoa Luu 63 36 0 06 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection Ruijie Tao Zexu Pan Rohan Kumar Das Xinyuan Qian Mike Zheng Shou Haizhou Li 22 175 0 14 Jul 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement Yuma Koizumi Shigeki Karita Scott Wisdom Hakan Erdogan J. Hershey Llion Jones M. Bacchiani 19 41 0 30 Jun 2021
A cappella: Audio-visual Singing Voice Separation Juan F. Montesinos V. S. Kadandale G. Haro 38 16 0 20 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios Xudong Xu Hang Zhou Ziwei Liu Bo Dai Xiaogang Wang Dahua Lin DiffM 13 55 0 13 Apr 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 29 33 0 18 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss Naoki Makishima Mana Ihori Akihiko Takashima Tomohiro Tanaka Shota Orihashi Ryo Masumura 30 8 0 02 Mar 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 196 199 0 08 Jan 2021
Semantic Audio-Visual Navigation Changan Chen Ziad Al-Halah Kristen Grauman 50 104 0 21 Dec 2020
Visual Speech Enhancement Without A Real Visual Stream Sindhu B. Hegde Prajwal K R Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar DiffM 20 17 0 20 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge Arsha Nagrani Joon Son Chung Jaesung Huh Andrew Brown Ernesto Coto Weidi Xie Mitchell McLaren D. Reynolds Andrew Zisserman 21 74 0 12 Dec 2020
Listening to Sounds of Silence for Speech Denoising Ruilin Xu Rundi Wu Y. Ishiwaka Carl Vondrick Changxi Zheng 28 32 0 22 Oct 2020
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement Hang Chen Jun Du Yu Hu Lirong Dai Baocai Yin Chin-Hui Lee 33 19 0 21 Sep 2020
Seeing wake words: Audio-visual Keyword Spotting Liliane Momeni Triantafyllos Afouras Themos Stafylakis Samuel Albanie Andrew Zisserman 46 43 0 02 Sep 2020
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild Prajwal K R Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar EGVM 52 757 0 23 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video Triantafyllos Afouras Andrew Owens Joon Son Chung Andrew Zisserman SSL 19 253 0 10 Aug 2020
Modality Dropout for Improved Performance-driven Talking Faces Ahmed Hussen Abdelaziz B. Theobald Paul Dixon Reinhard Knothe N. Apostoloff Sachin Kajareker 24 37 0 27 May 2020
Audio-visual Multi-channel Recognition of Overlapped Speech Jianwei Yu Bo Wu R. Yu Shi-Xiong Zhang Lianwu Chen Yong Xu. Meng Yu Dan Su Dong Yu Xunying Liu Helen Meng 18 19 0 18 May 2020
Multimodal Target Speech Separation with Voice and Face References Leyuan Qu C. Weber S. Wermter CVBM 19 19 0 17 May 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis Prajwal K R Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar 18 110 0 17 May 2020
FaceFilter: Audio-visual speech separation using still images Soo-Whan Chung Soyeon Choe Joon Son Chung Hong-Goo Kang CVBM 21 66 0 14 May 2020
Discriminative Multi-modality Speech Recognition Bo Xu Cheng Lu Yandong Guo Jacob Wang 23 98 0 12 May 2020
Neural Spatio-Temporal Beamformer for Target Speech Separation Yong-mei Xu Meng Yu Shi-Xiong Zhang Lianwu Chen Chao Weng Jianming Liu Dong Yu 26 41 0 08 May 2020
On the Role of Visual Cues in Audiovisual Speech Enhancement Zakaria Aldeneh Anushree Prasanna Kumar B. Theobald Erik Marchi S. Kajarekar Devang Naik Ahmed Hussen Abdelaziz 28 6 0 25 Apr 2020
Phase reconstruction based on recurrent phase unwrapping with deep neural networks Yoshiki Masuyama Kohei Yatabe Yuma Koizumi Yasuhiro Oikawa N. Harada 22 21 0 14 Feb 2020
Deep Audio-Visual Learning: A Survey Hao Zhu Mandi Luo Rui Wang A. Zheng Ran He 31 156 0 14 Jan 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset Jianwei Yu Shi-Xiong Zhang Jian Wu Shahram Ghorbani Bo Wu Shiyin Kang Shansong Liu Xunying Liu Helen Meng Dong Yu 32 72 0 06 Jan 2020
Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement M. Sadeghi Xavier Alameda-Pineda 13 21 0 23 Dec 2019
Listen to Look: Action Recognition by Previewing Audio Ruohan Gao Tae-Hyun Oh Kristen Grauman Lorenzo Torresani VLM 29 251 0 10 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications Arda Senocak Tae-Hyun Oh Junsik Kim Ming-Hsuan Yang In So Kweon SSL 33 52 0 20 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion Hamid Reza Vaezi Joze Amirreza Shaban Michael L. Iuzzolino K. Koishida 18 277 0 20 Nov 2019
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition Takaki Makino H. Liao Yannis Assael Brendan Shillingford Basi García Otavio Braga Olivier Siohan 18 129 0 08 Nov 2019
Recursive Visual Sound Separation Using Minus-Plus Net Xudong Xu Bo Dai Dahua Lin 35 91 0 30 Aug 2019
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders M. Sadeghi Simon Leglaive Xavier Alameda-Pineda Laurent Girin Radu Horaud DiffM 27 65 0 07 Aug 2019
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect Daniel Michelsanti Zheng-Hua Tan S. Sigurðsson Jesper Jensen 6 36 0 29 May 2019
Human-like machine thinking: Language guided imagination Feng Qi Wenchuan Wu AI4CE MLLM 16 5 0 18 May 2019
Audio-Visual Model Distillation Using Acoustic Images Andrés F. Pérez Valentina Sanguineti Pietro Morerio Vittorio Murino VLM 15 27 0 16 Apr 2019
Co-Separating Sounds of Visual Objects Ruohan Gao Kristen Grauman 33 206 0 16 Apr 2019
Time Domain Audio Visual Speech Separation Jian Wu Yong-mei Xu Shi-Xiong Zhang Lianwu Chen Meng Yu Lei Xie Dong Yu 25 114 0 07 Apr 2019
2.5D Visual Sound Ruohan Gao Kristen Grauman VGen 24 130 0 11 Dec 2018
The Visual Centrifuge: Model-Free Layered Video Representations Jean-Baptiste Alayrac João Carreira Andrew Zisserman 21 48 0 04 Dec 2018
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective Zhong-Qiu Wang Ke Tan DeLiang Wang 50 95 0 22 Nov 2018
Who Do I Sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search R. Krishnan Bilal Soomro Mahesh Subedar Ville Hautamaki Tomi Kinnunen 27 5 0 08 Nov 2018
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments Yufei Wang Luca Pasa Lantao Yu Rohit Singh Luciano Fadiga L. Joppa CVBM 15 59 0 06 Nov 2018