ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04826
  4. Cited By
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned
  Spectrogram Masking
v1v2v3v4v5v6 (latest)

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

11 October 2018
Quan Wang
Hannah Muckenhirn
K. Wilson
Prashant Sridhar
Zelin Wu
J. Hershey
Rif A. Saurous
Ron J. Weiss
Ye Jia
Ignacio López Moreno
ArXiv (abs)PDFHTML

Papers citing "VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking"

50 / 193 papers shown
Title
Listen to Extract: Onset-Prompted Target Speaker Extraction
Listen to Extract: Onset-Prompted Target Speaker Extraction
Pengjie Shen
Kangrui Chen
Shulin He
Pengru Chen
Shuqi Yuan
He Kong
Xueliang Zhang
Zehao Wang
91
0
0
08 May 2025
UniSep: Universal Target Audio Separation with Language Models at Scale
UniSep: Universal Target Audio Separation with Language Models at Scale
Yun Wang
Hangting Chen
Dongchao Yang
Weiqin Li
Dan Luo
Guangzhi Li
Shan Yang
Zhiyong Wu
Helen Meng
Xixin Wu
VLM
84
1
0
31 Mar 2025
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Wupeng Wang
Zexu Pan
Jingru Lin
Shuai Wang
Haizhou Li
110
0
0
16 Mar 2025
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
Shitong Xu
Yiyuan Yang
Niki Trigoni
Andrew Markham
74
0
0
23 Feb 2025
End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario
Mohsen Ghane
Mohammad Sadegh Safari
137
0
0
28 Jan 2025
Beyond Speaker Identity: Text Guided Target Speech Extraction
Beyond Speaker Identity: Text Guided Target Speech Extraction
Mingyue Huo
Abhinav Jain
Cong Phuoc Huynh
Fanjie Kong
Pichao Wang
Zhu Liu
Vimal Bhat
76
1
0
17 Jan 2025
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
101
6
0
17 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
131
29
0
02 Jan 2025
Task-Aware Unified Source Separation
Task-Aware Unified Source Separation
Kohei Saijo
Janek Ebbers
François Germain
Gordon Wichern
Jonathan Le Roux
75
2
0
31 Oct 2024
Improving curriculum learning for target speaker extraction with
  synthetic speakers
Improving curriculum learning for target speaker extraction with synthetic speakers
Yun Liu
Xuechen Liu
Junichi Yamagishi
69
0
0
01 Oct 2024
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Kohei Saijo
Janek Ebbers
François Germain
Sameer Khurana
Gordon Wichern
Jonathan Le Roux
99
1
0
20 Sep 2024
On the effectiveness of enrollment speech augmentation for Target
  Speaker Extraction
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
Junjie Li
Ke Zhang
Shuai Wang
Haizhou Li
Man-Wai Mak
Kong Aik Lee
57
2
0
15 Sep 2024
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
Donghang Wu
Yiwen Wang
Xihong Wu
T. Qu
Mamba
123
4
0
07 Sep 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
107
5
0
04 Sep 2024
Spectron: Target Speaker Extraction using Conditional Transformer with
  Adversarial Refinement
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement
Tathagata Bandyopadhyay
ViT
85
0
0
02 Sep 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
112
6
0
21 Jul 2024
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech
  Separation By Leveraging Narrow- and Cross-Band Modeling
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
Vahid Ahmadi Kalkhorani
Cheng Yu
Anurag Kumar
Ke Tan
Buye Xu
DeLiang Wang
87
1
0
17 Jun 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech
  Separation and Recognition
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Guinan Li
Jiajun Deng
Youjun Chen
Mengzhe Geng
Shujie Hu
...
Zengrui Jin
Tianzi Wang
Xurong Xie
Helen Meng
Xunying Liu
VLM
56
0
0
14 Jun 2024
Target Speaker Extraction with Curriculum Learning
Target Speaker Extraction with Curriculum Learning
Yun Liu
Xuechen Liu
Xiaoxiao Miao
Junichi Yamagishi
56
3
0
12 Jun 2024
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for
  Speech Interruption During Human-Robot Interaction
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
Yue Li
Florian A. Kunneman
Koen V. Hindriks
62
2
0
22 May 2024
Look Once to Hear: Target Speech Hearing with Noisy Examples
Look Once to Hear: Target Speech Hearing with Noisy Examples
Bandhav Veluri
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
90
17
0
10 May 2024
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao
Xinyuan Qian
Yidi Jiang
Junjie Li
Jiadong Wang
Haizhou Li
73
2
0
29 Apr 2024
A lightweight dual-stage framework for personalized speech enhancement
  based on DeepFilterNet2
A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2
Thomas Serre
Mathieu Fontaine
Éric Benhaim
Geoffroy Dutour
S. Essid
37
0
0
11 Apr 2024
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
  Strategy
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
Wenxuan Wu
Xueyuan Chen
Xixin Wu
Haizhou Li
Helen M. Meng
56
3
0
24 Mar 2024
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li
Koen V. Hindriks
Florian A. Kunneman
49
2
0
05 Mar 2024
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
KELM
91
5
0
06 Feb 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
161
24
0
30 Jan 2024
Spatial-Temporal Activity-Informed Diarization and Separation
Spatial-Temporal Activity-Informed Diarization and Separation
Yicheng Hsu
Ssuhan Chen
Mingsian R. Bai
48
0
0
30 Jan 2024
Continuous Target Speech Extraction: Enhancing Personalized Diarization
  and Extraction on Complex Recordings
Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex Recordings
He Zhao
Hangting Chen
Jianwei Yu
Yuehai Wang
78
1
0
29 Jan 2024
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down
  Fusion
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Samuel Pegg
Kai Li
Xiaolin Hu
90
1
0
25 Jan 2024
Online Similarity-and-Independence-Aware Beamformer for Low-latency
  Target Sound Extraction
Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction
Atsuo Hiroe
51
0
0
27 Dec 2023
3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time
  and Low-Resource Applications
3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications
Shulin He
Jinjiang Liu
Hao Li
Yang-Rui Yang
Fei Chen
Xueliang Zhang
52
3
0
18 Dec 2023
Self-Supervised Disentangled Representation Learning for Robust Target
  Speech Extraction
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
Sining Sun
Qing Yang
SSL
77
10
0
16 Dec 2023
NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint
  Auditory Attention Detection
NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection
Zexu Pan
Gordon Wichern
François Germain
Sameer Khurana
Jonathan Le Roux
63
12
0
12 Dec 2023
Audio Prompt Tuning for Universal Sound Separation
Audio Prompt Tuning for Universal Sound Separation
Yuzhuo Liu
Xubo Liu
Yan Zhao
Yuanyuan Wang
Rui Xia
Pingchuan Tain
Yuxuan Wang
VLM
79
6
0
30 Nov 2023
Personalizing Keyword Spotting with Speaker Information
Personalizing Keyword Spotting with Speaker Information
Beltrán Labrador
Pai Zhu
Guanlong Zhao
Angelo Scorza Scarpati
Quan Wang
Alicia Lozano-Diez
Alex Park
Ignacio López Moreno
50
2
0
06 Nov 2023
FedTherapist: Mental Health Monitoring with User-Generated Linguistic
  Expressions on Smartphones via Federated Learning
FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning
Jaemin Shin
Hyungjun Yoon
Seungjoo Lee
Sungjoon Park
Yunxin Liu
Jinho D. Choi
Sung-Ju Lee
70
6
0
25 Oct 2023
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Davide Berghi
Peipei Wu
Meng Cui
Jianyuan Sun
Philip J. B. Jackson
Wenwu Wang
BDL
128
7
0
23 Oct 2023
LocSelect: Target Speaker Localization with an Auditory Selective
  Hearing Mechanism
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism
Yu Chen
Xinyuan Qian
Zexu Pan
Kainan Chen
Haizhou Li
40
3
0
16 Oct 2023
A Single Speech Enhancement Model Unifying Dereverberation, Denoising,
  Speaker Counting, Separation, and Extraction
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
Kohei Saijo
Wangyou Zhang
Zhong-Qiu Wang
Shinji Watanabe
Tetsunori Kobayashi
Tetsuji Ogawa
VLM
70
6
0
12 Oct 2023
A Glance is Enough: Extract Target Sentence By Looking at A keyword
A Glance is Enough: Extract Target Sentence By Looking at A keyword
Ying Shi
Dong Wang
Lantian Li
Jiqing Han
94
1
0
09 Oct 2023
An Exploration of Task-decoupling on Two-stage Neural Post Filter for
  Real-time Personalized Acoustic Echo Cancellation
An Exploration of Task-decoupling on Two-stage Neural Post Filter for Real-time Personalized Acoustic Echo Cancellation
Zihan Zhang
Jiayao Sun
Xianjun Xia
Ziqian Wang
Xiaopeng Yan
Yijian Xiao
Lei Xie
49
0
0
07 Oct 2023
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
Jiarui Hai
Helin Wang
Dongchao Yang
Karan Thakkar
Najim Dehak
Mounya Elhilali
DiffM
72
9
0
06 Oct 2023
MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice
  Enhancement
MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
Weiming Xu
Zhouxuan Chen
Zhili Tan
Shubo Lv
Ru Han
Wenjiang Zhou
Weifeng Zhao
Lei Xie
46
2
0
06 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBMAuLLM
107
128
0
01 Oct 2023
The second multi-channel multi-party meeting transcription challenge
  (M2MeT) 2.0): A benchmark for speaker-attributed ASR
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR
Yuhao Liang
Mohan Shi
Fan Yu
Yangze Li
Shiliang Zhang
...
Jian Wu
Zhuo Chen
Kong Aik Lee
Zhijie Yan
Hui Bu
67
5
0
24 Sep 2023
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network
Qinghua Liu
Meng Ge
Zhizheng Wu
Haizhou Li
73
1
0
13 Sep 2023
ReZero: Region-customizable Sound Extraction
ReZero: Region-customizable Sound Extraction
Rongzhi Gu
Yi Luo
66
16
0
31 Aug 2023
Convoifilter: A case study of doing cocktail party speech recognition
Convoifilter: A case study of doing cocktail party speech recognition
Thai-Binh Nguyen
A. Waibel
73
2
0
22 Aug 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
113
86
0
14 Aug 2023
1234
Next