ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04826
  4. Cited By
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned
  Spectrogram Masking
v1v2v3v4v5v6 (latest)

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

11 October 2018
Quan Wang
Hannah Muckenhirn
K. Wilson
Prashant Sridhar
Zelin Wu
J. Hershey
Rif A. Saurous
Ron J. Weiss
Ye Jia
Ignacio López Moreno
ArXiv (abs)PDFHTML

Papers citing "VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking"

50 / 193 papers shown
Title
Target Speech Extraction with Conditional Diffusion Model
Target Speech Extraction with Conditional Diffusion Model
Naoyuki Kamo
Marc Delcroix
Tomohiro Nakatan
DiffM
65
22
0
08 Aug 2023
Complete and separate: Conditional separation with missing target source
  attribute completion
Complete and separate: Conditional separation with missing target source attribute completion
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
87
0
0
27 Jul 2023
Audio-Visual Speech Enhancement With Selective Off-Screen Speech
  Extraction
Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction
Tomoya Yoshinaga
Keitaro Tanaka
Shigeo Morishima
63
0
0
10 Jun 2023
End-to-End Joint Target and Non-Target Speakers ASR
End-to-End Joint Target and Non-Target Speakers ASR
Ryo Masumura
Naoki Makishima
Taiga Yamane
Yoshihiko Yamazaki
Saki Mizuno
...
Akihiko Takashima
Satoshi Suzuki
Takafumi Moriya
Nobukatsu Hojo
Atsushi Ando
60
5
0
04 Jun 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Wangyou Zhang
Y. Qian
89
11
0
25 May 2023
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with
  Convolutional Cross Attention in Multi-talker Conditions
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions
Jie Zhang
Qingquan Xu
Qiu-shi Zhu
Zhenhua Ling
70
12
0
17 May 2023
Universal Source Separation with Weakly Labelled Data
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
Kai Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
79
22
0
11 May 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
165
48
0
21 Mar 2023
Target Sound Extraction with Variable Cross-modality Clues
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li
Yao Qian
Zhuo Chen
Dongmei Wang
Takuya Yoshioka
Shujie Liu
Y. Qian
Michael Zeng
VLM
68
14
0
15 Mar 2023
Online Binaural Speech Separation of Moving Speakers With a Wavesplit
  Network
Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network
Cong Han
N. Mesgarani
61
4
0
13 Mar 2023
X-SepFormer: End-to-end Speaker Extraction Network with Explicit
  Optimization on Speaker Confusion
X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion
Kai Liu
Z.C. Du
Xucheng Wan
Huan Zhou
98
24
0
09 Mar 2023
TS-SEP: Joint Diarization and Separation Conditioned on Estimated
  Speaker Embeddings
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Christoph Boeddeker
Aswin Shanmugam Subramanian
Gordon Wichern
Reinhold Haeb-Umbach
Jonathan Le Roux
93
24
0
07 Mar 2023
A Framework for Unified Real-time Personalized and Non-Personalized
  Speech Enhancement
A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement
Zhepei Wang
Ritwik Giri
Devansh P. Shah
J. Valin
Mike Goodwin
Paris Smaragdis
67
9
0
23 Feb 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An Overview
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
70
95
0
31 Jan 2023
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker
  Embeddings
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Kai Liu
Xucheng Wan
Z.C. Du
Huan Zhou
VLM
51
1
0
16 Jan 2023
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysis
P. Ochieng
117
22
0
01 Dec 2022
Array Configuration-Agnostic Personalized Speech Enhancement using
  Long-Short-Term Spatial Coherence
Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence
Yicheng Hsu
Yonghan Lee
M. Bai
45
3
0
16 Nov 2022
The Potential of Neural Speech Synthesis-based Data Augmentation for
  Personalized Speech Enhancement
The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement
Anastasia Kuznetsova
Aswin Sivaraman
Minje Kim
54
3
0
14 Nov 2022
Optimal Condition Training for Target Source Separation
Optimal Condition Training for Target Source Separation
Efthymios Tzinis
Gordon Wichern
Paris Smaragdis
Jonathan Le Roux
77
5
0
11 Nov 2022
Cross-Attention is all you need: Real-Time Streaming Transformers for
  Personalised Speech Enhancement
Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement
Shucong Zhang
Malcolm Chadwick
Alberto Gil C. P. Ramos
S. Bhattacharya
54
5
0
08 Nov 2022
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo
  Cancellation
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation
Sefik Emre Eskimez
Takuya Yoshioka
Alex Ju
M. Tang
Tanel Pärnamaa
Huaming Wang
63
7
0
04 Nov 2022
Spatially Selective Deep Non-linear Filters for Speaker Extraction
Spatially Selective Deep Non-linear Filters for Speaker Extraction
Kristina Tesch
Timo Gerkmann
66
17
0
04 Nov 2022
Real-Time Target Sound Extraction
Real-Time Target Sound Extraction
Bandhav Veluri
Justin Chan
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
112
33
0
04 Nov 2022
Adapting self-supervised models to multi-talker speech recognition using
  speaker embeddings
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
155
29
0
01 Nov 2022
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue
  through Embedding Inpainting
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting
Zexu Pan
Wupeng Wang
Marvin Borsdorf
Haizhou Li
85
12
0
31 Oct 2022
Hierarchical speaker representation for target speaker extraction
Hierarchical speaker representation for target speaker extraction
Shulin He
Huaiwen Zhang
Wei Rao
Kanghao Zhang
Yukai Ju
Yang-Rui Yang
Xueliang Zhang
60
7
0
28 Oct 2022
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker
  Embeddings for Target Speaker Separation
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation
Xiaoyu Liu
Xu Li
Joan Serrà
74
9
0
23 Oct 2022
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
Junjie Li
Meng Ge
Zexu Pan
Longbiao Wang
Jianwu Dang
55
10
0
09 Oct 2022
C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning
  for Speaker Verification
C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification
Chunlei Zhang
Dong Yu
117
20
0
15 Aug 2022
Attention and DCT based Global Context Modeling for Text-independent
  Speaker Recognition
Attention and DCT based Global Context Modeling for Text-independent Speaker Recognition
Wei Xia
John H. L. Hansen
58
4
0
04 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
119
30
0
20 Jul 2022
Multi-channel target speech enhancement based on ERB-scaled spatial
  coherence features
Multi-channel target speech enhancement based on ERB-scaled spatial coherence features
Yicheng Hsu
Yonghan Lee
M. Bai
55
1
0
17 Jul 2022
NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound
  Shadowing
NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing
Hanqing Guo
Chenning Li
Lingkun Li
Zhichao Cao
Qiben Yan
Li Xiao
13
3
0
12 Jul 2022
Speaker Verification in Multi-Speaker Environments Using Temporal
  Feature Fusion
Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Ahmad Aloradi
Wolfgang Mack
Mohamed Elminshawi
Emanuel Habets
63
5
0
28 Jun 2022
Semi-supervised Time Domain Target Speaker Extraction with Attention
Semi-supervised Time Domain Target Speaker Extraction with Attention
Zhepei Wang
Ritwik Giri
Shrikant Venkataramani
Umut Isik
J. Valin
Paris Smaragdis
Mike Goodwin
A. Krishnaswamy
59
7
0
18 Jun 2022
Simultaneous Speech Extraction for Multiple Target Speakers under the
  Meeting Scenarios
Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios
Bang Zeng
Weiqing Wang
Yuanyuan Bao
Ming Li
57
0
0
17 Jun 2022
Personalized Acoustic Echo Cancellation for Full-duplex Communications
Personalized Acoustic Echo Cancellation for Full-duplex Communications
Shimin Zhang
Ziteng Wang
Yukai Ju
Yihui Fu
Yueyue Na
Q. Fu
Linfu Xie
41
5
0
30 May 2022
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified
  Acoustic Echo Suppression And Speech Enhancement
NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement
Meng Yu
Yong-mei Xu
Chunlei Zhang
Shizhong Zhang
Dong Yu
50
11
0
20 May 2022
Streaming Noise Context Aware Enhancement For Automatic Speech
  Recognition in Multi-Talker Environments
Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
Joseph Peter Caroselli
A. Narayanan
Yiteng Huang
27
1
0
17 May 2022
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Bowen Shi
Abdel-rahman Mohamed
Wei-Ning Hsu
SSL
69
18
0
15 May 2022
Cleanformer: A multichannel array configuration-invariant neural
  enhancement frontend for ASR in smart speakers
Cleanformer: A multichannel array configuration-invariant neural enhancement frontend for ASR in smart speakers
Joseph Peter Caroselli
A. Narayanan
N. Howard
Tom O'Malley
51
5
0
25 Apr 2022
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker
  Extraction
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
Zifeng Zhao
Rongzhi Gu
Dongchao Yang
Jinchuan Tian
Yuexian Zou
59
2
0
15 Apr 2022
Text-Driven Separation of Arbitrary Sounds
Text-Driven Separation of Arbitrary Sounds
Kevin Kilgour
Beat Gfeller
Qingqing Huang
A. Jansen
Scott Wisdom
Marco Tagliasacchi
100
34
0
12 Apr 2022
Listen only to me! How well can target speech extraction handle false
  alarms?
Listen only to me! How well can target speech extraction handle false alarms?
Marc Delcroix
K. Kinoshita
Tsubasa Ochiai
Kateřina Žmolíková
Hiroshi Sato
Tomohiro Nakatani
71
15
0
11 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and
  enrollment clues for increased performance and continuous learning
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix
Jorge Bennasar Vázquez
Tsubasa Ochiai
K. Kinoshita
Yasunori Ohishi
S. Araki
VLM
83
34
0
08 Apr 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
92
26
0
07 Apr 2022
RaDur: A Reference-aware and Duration-robust Network for Target Sound
  Detection
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
Dongchao Yang
Helin Wang
Zhongjie Ye
Yuexian Zou
Wenwu Wang
57
0
0
05 Apr 2022
Improving Target Sound Extraction with Timestamp Information
Improving Target Sound Extraction with Timestamp Information
Helin Wang
Dongchao Yang
Chao Weng
Jianwei Yu
Yuexian Zou
64
10
0
02 Apr 2022
Speaker Extraction with Co-Speech Gestures Cue
Speaker Extraction with Co-Speech Gestures Cue
Zexu Pan
Xinyuan Qian
Haizhou Li
SLR
66
29
0
31 Mar 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition
  in Multi-party Meetings
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
Fan Yu
Zhihao Du
Shiliang Zhang
Yuxiao Lin
Linfu Xie
42
15
0
31 Mar 2022
Previous
1234
Next