ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04121
  4. Cited By
The Conversation: Deep Audio-Visual Speech Enhancement

The Conversation: Deep Audio-Visual Speech Enhancement

11 April 2018
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "The Conversation: Deep Audio-Visual Speech Enhancement"

50 / 104 papers shown
Title
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Danilo de Oliveira
Julius Richter
Tal Peer
Timo Germann
DiffM
22
0
0
16 May 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
Yuhang He
Sangyun Shin
Anoop Cherian
Niki Trigoni
Andrew Markham
78
0
0
31 Dec 2024
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Jean-Eudes Ayilo
Mostafa Sadeghi
Romain Serizel
Xavier Alameda-Pineda
DiffM
25
0
0
04 Oct 2024
Aligning Sight and Sound: Advanced Sound Source Localization Through
  Audio-Visual Alignment
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
38
3
0
18 Jul 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional
  Flow Matching
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
47
4
0
13 Jun 2024
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
47
0
0
01 Jun 2024
Robust Active Speaker Detection in Noisy Environments
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
40
0
0
27 Mar 2024
Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based
  Contextual Cues
Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues
Tassadaq Hussain
K. Dashtipour
Yu Tsao
Amir Hussain
29
2
0
26 Feb 2024
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down
  Fusion
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Samuel Pegg
Kai Li
Xiaolin Hu
32
1
0
25 Jan 2024
Seeing Through the Conversation: Audio-Visual Speech Separation based on
  Diffusion Model
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
Suyeon Lee
Chaeyoung Jung
Youngjoon Jang
Jaehun Kim
Joon Son Chung
33
7
0
30 Oct 2023
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
Incorporating Ultrasound Tongue Images for Audio-Visual Speech
  Enhancement through Knowledge Distillation
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Ruixin Zheng
Yang Ai
Zhenhua Ling
32
8
0
24 May 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data
Speaker Recognition in Realistic Scenario Using Multimodal Data
Saqlain Hussain Shah
M. S. Saeed
Shah Nawaz
Muhammad Haroon Yousaf
CVBM
26
8
0
25 Feb 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An Overview
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
23
86
0
31 Jan 2023
Leveraging Modality-specific Representations for Audio-visual Speech
  Recognition via Reinforcement Learning
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Chen Chen
Yuchen Hu
Qiang Zhang
Heqing Zou
Beier Zhu
Eng Siong Chng
33
26
0
10 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
34
27
0
07 Dec 2022
Injecting Spatial Information for Monaural Speech Enhancement via
  Knowledge Distillation
Injecting Spatial Information for Monaural Speech Enhancement via Knowledge Distillation
Xinmeng Xu
Weiping Tu
Yuhong Yang
19
0
0
02 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
28
51
0
28 Nov 2022
Egocentric Audio-Visual Noise Suppression
Egocentric Audio-Visual Noise Suppression
Roshan S. Sharma
Weipeng He
Ju Lin
Egor Lakomkin
Yang Liu
Kaustubh Kalgaonkar
EgoV
24
1
0
07 Nov 2022
Spatially Selective Deep Non-linear Filters for Speaker Extraction
Spatially Selective Deep Non-linear Filters for Speaker Extraction
Kristina Tesch
Timo Gerkmann
26
17
0
04 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Sindhu B. Hegde
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
16
1
0
17 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
39
30
0
20 Jul 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
26
36
0
15 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
40
5
0
12 Jul 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
39
80
0
16 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
21
7
0
04 Jun 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
27
19
0
26 Apr 2022
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation
  System
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System
M. Z. Ozturk
Chenshu Wu
Beibei Wang
Min Wu
K. Liu
27
20
0
14 Apr 2022
Listen only to me! How well can target speech extraction handle false
  alarms?
Listen only to me! How well can target speech extraction handle false alarms?
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Kateřina Žmolíková
Hiroshi Sato
Tomohiro Nakatani
34
15
0
11 Apr 2022
Audio-visual multi-channel speech separation, dereverberation and
  recognition
Audio-visual multi-channel speech separation, dereverberation and recognition
Guinan Li
Jianwei Yu
Jiajun Deng
Xunying Liu
Helen Meng
21
7
0
05 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
16
32
0
31 Mar 2022
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain
  Target Speaker Extraction
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Zexu Pan
Meng Ge
Haizhou Li
21
17
0
31 Mar 2022
Phase-Aware Deep Speech Enhancement: It's All About The Frame Length
Phase-Aware Deep Speech Enhancement: It's All About The Frame Length
Tal Peer
Timo Gerkmann
22
21
0
30 Mar 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
19
2
0
30 Mar 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
23
19
0
08 Mar 2022
Visually Supervised Speaker Detection and Localization via Microphone
  Array
Visually Supervised Speaker Detection and Localization via Microphone Array
Davide Berghi
A. Hilton
Philip J. B. Jackson
21
11
0
07 Mar 2022
Audio-visual speech separation based on joint feature representation
  with cross-modal attention
Audio-visual speech separation based on joint feature representation with cross-modal attention
Jun Xiong
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
Yanni Zhang
20
3
0
05 Mar 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
128
144
0
26 Feb 2022
Visual Acoustic Matching
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
21
56
0
14 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder
Kristen Grauman
27
21
0
02 Feb 2022
A Novel Temporal Attentive-Pooling based Convolutional Recurrent
  Architecture for Acoustic Signal Enhancement
A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement
Tassadaq Hussain
Wei-Chien Wang
M. Gogate
K. Dashtipour
Yu Tsao
Xugang Lu
A. Ahsan
Amir Hussain
21
3
0
24 Jan 2022
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
A. Brown
Jaesung Huh
Joon Son Chung
Arsha Nagrani
Daniel Garcia-Romero
Andrew Zisserman
31
40
0
12 Jan 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
29
41
0
06 Jan 2022
Robust Self-Supervised Audio-Visual Speech Recognition
Robust Self-Supervised Audio-Visual Speech Recognition
Bowen Shi
Wei-Ning Hsu
Abdel-rahman Mohamed
36
90
0
05 Jan 2022
Self-Supervised Learning based Monaural Speech Enhancement with
  Multi-Task Pre-Training
Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training
Yi Li
Yang Sun
S. M. Naqvi
SSL
24
0
0
21 Dec 2021
Towards Robust Real-time Audio-Visual Speech Enhancement
Towards Robust Real-time Audio-Visual Speech Enhancement
M. Gogate
K. Dashtipour
Amir Hussain
31
3
0
16 Dec 2021
U-shaped Transformer with Frequency-Band Aware Attention for Speech
  Enhancement
U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement
Yi Li
Yang Sun
S. M. Naqvi
23
25
0
11 Dec 2021
123
Next