ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.04306
  4. Cited By
Deep clustering: Discriminative embeddings for segmentation and
  separation

Deep clustering: Discriminative embeddings for segmentation and separation

18 August 2015
J. Hershey
Zhuo Chen
Jonathan Le Roux
Shinji Watanabe
ArXiv (abs)PDFHTML

Papers citing "Deep clustering: Discriminative embeddings for segmentation and separation"

50 / 357 papers shown
Title
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
Zhongweiyang Xu
Xulin Fan
Zhong-Qiu Wang
Xilin Jiang
Romit Roy Choudhury
DiffM
169
0
0
08 May 2025
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLMKELMVLM
148
1
0
06 May 2025
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction
A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction
Xiaoliang Chen
Xin Yu
Le Chang
Yunhe Huang
Jiashuai He
...
Jin Li
Likai Lin
Ziyu Zeng
Xianling Tu
Shuyu Zhang
110
1
0
04 May 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
85
1
0
28 Apr 2025
EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Jinwei Dong
Xinsheng Wang
Qirong Mao
143
1
0
28 Jan 2025
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
David Perera
Victor Letzelter
Théo Mariotte
Adrien Cortés
Mickaël Chen
S. Essid
Ga¨el Richard
165
4
0
20 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
131
29
0
02 Jan 2025
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li
Wendi Sang
Chang Zeng
Runxuan Yang
Guo Chen
Xiaolin Hu
128
3
0
02 Oct 2024
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu
Kai Li
Guo Chen
Xiaolin Hu
87
2
0
02 Oct 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
107
5
0
04 Sep 2024
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for
  Speech Interruption During Human-Robot Interaction
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
Yue Li
Florian A. Kunneman
Koen V. Hindriks
62
2
0
22 May 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
113
45
0
24 Apr 2024
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
KELM
91
5
0
06 Feb 2024
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for
  Enhanced Time-Domain Monaural Speech Separation
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
Shengkui Zhao
Yukun Ma
Chongjia Ni
Chong Zhang
Hao Wang
Trung Hieu Nguyen
Kun Zhou
J. Yip
Dianwen Ng
Bin Ma
92
29
0
19 Dec 2023
Remixing-based Unsupervised Source Separation from Scratch
Remixing-based Unsupervised Source Separation from Scratch
Kohei Saijo
Tetsuji Ogawa
47
3
0
01 Sep 2023
A Neural State-Space Model Approach to Efficient Speech Separation
A Neural State-Space Model Approach to Efficient Speech Separation
Chen Chen
Chao-Han Huck Yang
Kai Li
Yuchen Hu
Pin-Jui Ku
Chng Eng Siong
66
11
0
26 May 2023
Martian time-series unraveled: A multi-scale nested approach with
  factorial variational autoencoders
Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders
Ali Siahkoohi
Rudy Morel
Randall Balestriero
Erwan Allys
G. Sainton
Taichi Kawamura
Maarten V. de Hoop
141
2
0
25 May 2023
Towards Solving Cocktail-Party: The First Method to Build a Realistic
  Dataset with Ground Truths for Speech Separation
Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
Rawad Melhem
Assef Jafar
Oumayma Al Dakkak
41
0
0
25 May 2023
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross
  Attention
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention
J. Yip
Tuan Truong
Dianwen Ng
Chong Zhang
Yukun Ma
Trung Hieu Nguyen
Chongjia Ni
Shengkui Zhao
Chng Eng Siong
Bin Ma
40
2
0
20 May 2023
Posthoc Interpretation via Quantization
Posthoc Interpretation via Quantization
Francesco Paissan
Cem Subakan
Mirco Ravanelli
MQ
110
7
0
22 Mar 2023
Multi-Dimensional and Multi-Scale Modeling for Speech Separation
  Optimized by Discriminative Learning
Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning
Zhaoxi Mu
Xinyu Yang
Wenjing Zhu
68
5
0
07 Mar 2023
A Multi-Stage Triple-Path Method for Speech Separation in Noisy and
  Reverberant Environments
A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments
Zhaoxi Mu
Xinyu Yang
Xiangyuan Yang
Wenjing Zhu
38
5
0
07 Mar 2023
Multi-Channel Target Speaker Extraction with Refinement: The WavLab
  Submission to the Second Clarity Enhancement Challenge
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge
Samuele Cornell
Zhongqiu Wang
Yoshiki Masuyama
Shinji Watanabe
Manuel Pariente
Nobutaka Ono
81
12
0
15 Feb 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An Overview
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
68
95
0
31 Jan 2023
An Audio-Visual Speech Separation Model Inspired by
  Cortico-Thalamo-Cortical Circuits
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
Kai Li
Fenghua Xie
Hang Chen
K. Yuan
Xiaolin Hu
91
16
0
21 Dec 2022
Tackling the Cocktail Fork Problem for Separation and Transcription of
  Real-World Soundtracks
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Darius Petermann
Gordon Wichern
Aswin Shanmugam Subramanian
Zhong-Qiu Wang
Jonathan Le Roux
63
10
0
14 Dec 2022
Multi-Scale Feature Fusion Transformer Network for End-to-End Single
  Channel Speech Separation
Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Yinhao Xu
Jian Zhou
L. Tao
H. Kwan
104
0
0
14 Dec 2022
Hyperbolic Audio Source Separation
Hyperbolic Audio Source Separation
Darius Petermann
Gordon Wichern
Aswin Shanmugam Subramanian
Jonathan Le Roux
73
10
0
09 Dec 2022
Self-training via Metric Learning for Source-Free Domain Adaptation of
  Semantic Segmentation
Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation
Ibrahim Batuhan Akkaya
U. Halici
TTA
85
2
0
08 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
123
29
0
07 Dec 2022
NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer
NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer
Changsheng Quan
Xiaofei Li
62
2
0
05 Dec 2022
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysis
P. Ochieng
99
22
0
01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
96
52
0
28 Nov 2022
Latent Iterative Refinement for Modular Source Separation
Latent Iterative Refinement for Modular Source Separation
Dimitrios Bralios
Efthymios Tzinis
Gordon Wichern
Paris Smaragdis
Jonathan Le Roux
BDL
60
5
0
22 Nov 2022
Array Configuration-Agnostic Personalized Speech Enhancement using
  Long-Short-Term Spatial Coherence
Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence
Yicheng Hsu
Yonghan Lee
M. Bai
45
3
0
16 Nov 2022
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation
Chang-Bin Jeon
Hyeongi Moon
Keunwoo Choi
Ben Sangbae Chon
Kyogu Lee
54
5
0
14 Nov 2022
Speech separation with large-scale self-supervised learning
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
81
15
0
09 Nov 2022
Real-Time Target Sound Extraction
Real-Time Target Sound Extraction
Bandhav Veluri
Justin Chan
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
112
33
0
04 Nov 2022
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue
  through Embedding Inpainting
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting
Zexu Pan
Wupeng Wang
Marvin Borsdorf
Haizhou Li
80
12
0
31 Oct 2022
Diffusion-based Generative Speech Source Separation
Diffusion-based Generative Speech Source Separation
Robin Scheibler
Youna Ji
Soo-Whan Chung
J. Byun
Soyeon Choe
Min-Seok Choi
DiffM
120
48
0
31 Oct 2022
CasNet: Investigating Channel Robustness for Speech Separation
CasNet: Investigating Channel Robustness for Speech Separation
Fan Wang
Yao-Fei Cheng
Hung-Shin Lee
Yu Tsao
Hsin-Min Wang
54
2
0
27 Oct 2022
TSUP Speaker Diarization System for Conversational Short-phrase Speaker
  Diarization Challenge
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Bowen Pang
Huan Zhao
Gaosheng Zhang
Xiaoyue Yang
Yanguo Sun
Li Zhang
Qing Wang
Linfu Xie
BDL
52
2
0
26 Oct 2022
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
Junjie Li
Meng Ge
Zexu Pan
Longbiao Wang
Jianwu Dang
50
10
0
09 Oct 2022
Music Source Separation with Band-split RNN
Music Source Separation with Band-split RNN
Yi Luo
Jianwei Yu
121
120
0
30 Sep 2022
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural
  Speaker Separation
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Zhong-Qiu Wang
Samuele Cornell
Shukjae Choi
Younglo Lee
Byeonghak Kim
Shinji Watanabe
149
108
0
08 Sep 2022
Target Speaker Voice Activity Detection with Transformers and Its
  Integration with End-to-End Neural Diarization
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Dongmei Wang
Xiong Xiao
Naoyuki Kanda
Takuya Yoshioka
Jian Wu
77
28
0
27 Aug 2022
Music Separation Enhancement with Generative Modeling
Music Separation Enhancement with Generative Modeling
N. Schaffer
Boaz Cogan
Ethan Manilow
Max Morrison
Prem Seetharaman
Bryan Pardo
58
9
0
26 Aug 2022
Spatial Aware Multi-Task Learning Based Speech Separation
Spatial Aware Multi-Task Learning Based Speech Separation
Wei Sun
Mei Wang
L. Qiu
30
3
0
20 Jul 2022
PodcastMix: A dataset for separating music and speech in podcasts
PodcastMix: A dataset for separating music and speech in podcasts
Nico M. Schmidt
Jordi Pons
M. Miron
46
3
0
15 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning
  to Separate
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate
Nabarun Goswami
Tatsuya Harada
78
5
0
13 Jul 2022
12345678
Next