ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04121
  4. Cited By
The Conversation: Deep Audio-Visual Speech Enhancement

The Conversation: Deep Audio-Visual Speech Enhancement

11 April 2018
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "The Conversation: Deep Audio-Visual Speech Enhancement"

50 / 107 papers shown
Title
Learning-based personal speech enhancement for teleconferencing by
  exploiting spatial-spectral features
Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Yicheng Hsu
Yonghan Lee
M. Bai
22
10
0
10 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
272
1,026
0
13 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic
  Voice Over
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
36
19
0
07 Oct 2021
USEV: Universal Speaker Extraction with Visual Cue
USEV: Universal Speaker Extraction with Visual Cue
Zexu Pan
Meng Ge
Haizhou Li
34
41
0
30 Sep 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
44
20
0
17 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
63
36
0
06 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for
  Audio-visual Active Speaker Detection
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
22
175
0
14 Jul 2021
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using
  linear complexity self-attention for speech enhancement
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement
Yuma Koizumi
Shigeki Karita
Scott Wisdom
Hakan Erdogan
J. Hershey
Llion Jones
M. Bacchiani
19
41
0
30 Jun 2021
A cappella: Audio-visual Singing Voice Separation
A cappella: Audio-visual Singing Voice Separation
Juan F. Montesinos
V. S. Kadandale
G. Haro
38
16
0
20 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
55
0
13 Apr 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
30
8
0
02 Mar 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
Semantic Audio-Visual Navigation
Semantic Audio-Visual Navigation
Changan Chen
Ziad Al-Halah
Kristen Grauman
50
104
0
21 Dec 2020
Visual Speech Enhancement Without A Real Visual Stream
Visual Speech Enhancement Without A Real Visual Stream
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
DiffM
20
17
0
20 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani
Joon Son Chung
Jaesung Huh
Andrew Brown
Ernesto Coto
Weidi Xie
Mitchell McLaren
D. Reynolds
Andrew Zisserman
21
74
0
12 Dec 2020
Listening to Sounds of Silence for Speech Denoising
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
28
32
0
22 Oct 2020
Correlating Subword Articulation with Lip Shapes for Embedding Aware
  Audio-Visual Speech Enhancement
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Baocai Yin
Chin-Hui Lee
33
19
0
21 Sep 2020
Seeing wake words: Audio-visual Keyword Spotting
Seeing wake words: Audio-visual Keyword Spotting
Liliane Momeni
Triantafyllos Afouras
Themos Stafylakis
Samuel Albanie
Andrew Zisserman
46
43
0
02 Sep 2020
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The
  Wild
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
EGVM
52
757
0
23 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Modality Dropout for Improved Performance-driven Talking Faces
Modality Dropout for Improved Performance-driven Talking Faces
Ahmed Hussen Abdelaziz
B. Theobald
Paul Dixon
Reinhard Knothe
N. Apostoloff
Sachin Kajareker
24
37
0
27 May 2020
Audio-visual Multi-channel Recognition of Overlapped Speech
Audio-visual Multi-channel Recognition of Overlapped Speech
Jianwei Yu
Bo Wu
R. Yu
Shi-Xiong Zhang
Lianwu Chen
Yong Xu. Meng Yu
Dan Su
Dong Yu
Xunying Liu
Helen Meng
18
19
0
18 May 2020
Multimodal Target Speech Separation with Voice and Face References
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
18
110
0
17 May 2020
FaceFilter: Audio-visual speech separation using still images
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
21
66
0
14 May 2020
Discriminative Multi-modality Speech Recognition
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
23
98
0
12 May 2020
Neural Spatio-Temporal Beamformer for Target Speech Separation
Neural Spatio-Temporal Beamformer for Target Speech Separation
Yong-mei Xu
Meng Yu
Shi-Xiong Zhang
Lianwu Chen
Chao Weng
Jianming Liu
Dong Yu
26
41
0
08 May 2020
On the Role of Visual Cues in Audiovisual Speech Enhancement
On the Role of Visual Cues in Audiovisual Speech Enhancement
Zakaria Aldeneh
Anushree Prasanna Kumar
B. Theobald
Erik Marchi
S. Kajarekar
Devang Naik
Ahmed Hussen Abdelaziz
28
6
0
25 Apr 2020
Phase reconstruction based on recurrent phase unwrapping with deep
  neural networks
Phase reconstruction based on recurrent phase unwrapping with deep neural networks
Yoshiki Masuyama
Kohei Yatabe
Yuma Koizumi
Yasuhiro Oikawa
N. Harada
22
21
0
14 Feb 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Jianwei Yu
Shi-Xiong Zhang
Jian Wu
Shahram Ghorbani
Bo Wu
Shiyin Kang
Shansong Liu
Xunying Liu
Helen Meng
Dong Yu
32
72
0
06 Jan 2020
Mixture of Inference Networks for VAE-based Audio-visual Speech
  Enhancement
Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement
M. Sadeghi
Xavier Alameda-Pineda
13
21
0
23 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
18
277
0
20 Nov 2019
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Takaki Makino
H. Liao
Yannis Assael
Brendan Shillingford
Basi García
Otavio Braga
Olivier Siohan
18
129
0
08 Nov 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
35
91
0
30 Aug 2019
Audio-visual Speech Enhancement Using Conditional Variational
  Auto-Encoders
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders
M. Sadeghi
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
DiffM
27
65
0
07 Aug 2019
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of
  Lombard Effect
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect
Daniel Michelsanti
Zheng-Hua Tan
S. Sigurðsson
Jesper Jensen
6
36
0
29 May 2019
Human-like machine thinking: Language guided imagination
Human-like machine thinking: Language guided imagination
Feng Qi
Wenchuan Wu
AI4CE
MLLM
16
5
0
18 May 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
33
206
0
16 Apr 2019
Time Domain Audio Visual Speech Separation
Time Domain Audio Visual Speech Separation
Jian Wu
Yong-mei Xu
Shi-Xiong Zhang
Lianwu Chen
Meng Yu
Lei Xie
Dong Yu
25
114
0
07 Apr 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
24
130
0
11 Dec 2018
The Visual Centrifuge: Model-Free Layered Video Representations
The Visual Centrifuge: Model-Free Layered Video Representations
Jean-Baptiste Alayrac
João Carreira
Andrew Zisserman
21
48
0
04 Dec 2018
Deep Learning Based Phase Reconstruction for Speaker Separation: A
  Trigonometric Perspective
Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective
Zhong-Qiu Wang
Ke Tan
DeLiang Wang
50
95
0
22 Nov 2018
Who Do I Sound Like? Showcasing Speaker Recognition Technology by
  YouTube Voice Search
Who Do I Sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search
R. Krishnan
Bilal Soomro
Mahesh Subedar
Ville Hautamaki
Tomi Kinnunen
27
5
0
08 Nov 2018
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement
  in Multi-Talker Environments
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
Yufei Wang
Luca Pasa
Lantao Yu
Rohit Singh
Luciano Fadiga
L. Joppa
CVBM
15
59
0
06 Nov 2018
Previous
123
Next