ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07065
  4. Cited By
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
v1v2 (latest)

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

International Conference on Learning Representations (ICLR), 2022
14 December 2022
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
    VLMCLIP
ArXiv (abs)PDFHTML

Papers citing "CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos"

30 / 30 papers shown
Title
PromptSep: Generative Audio Separation via Multimodal Prompting
PromptSep: Generative Audio Separation via Multimodal Prompting
Yutong Wen
Ke Chen
Prem Seetharaman
Oriol Nieto
Jiaqi Su
Rithesh Kumar
Minje Kim
Paris Smaragdis
Zeyu Jin
Justin Salamon
DiffM
221
0
0
06 Nov 2025
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
Zihan Zhang
Xize Cheng
Zhennan Jiang
Dongjie Fu
Jingyuan Chen
Zhou Zhao
Tao Jin
54
0
0
12 Oct 2025
MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation
MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation
Akira Takahashi
Shusuke Takahashi
Yuki Mitsufuji
VGen
72
0
0
10 Oct 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
103
0
0
26 Sep 2025
A Modality-Aware Cooperative Co-Evolutionary Framework for Multimodal Graph Neural Architecture Search
A Modality-Aware Cooperative Co-Evolutionary Framework for Multimodal Graph Neural Architecture Search
Sixuan Wang
Jiao Yin
Jinli Cao
MingJian Tang
Yong-Feng Ge
40
0
0
23 Sep 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
333
0
0
04 Jun 2025
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
Geonyoung Lee
Geonhee Han
Paul Hongsuck Seo
DiffM
213
1
0
03 Jun 2025
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Yunkee Chae
Kyogu Lee
260
0
0
29 May 2025
Text-Queried Audio Source Separation via Hierarchical Modeling
Text-Queried Audio Source Separation via Hierarchical Modeling
Xinlei Yin
Xiulian Peng
Xue Jiang
Zhiwei Xiong
Yan Lu
140
0
0
27 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
201
0
0
08 May 2025
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
334
1
0
13 Mar 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
294
14
0
28 Jan 2025
Beyond Speaker Identity: Text Guided Target Speech Extraction
Beyond Speaker Identity: Text Guided Target Speech ExtractionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Mingyue Huo
Abhinav Jain
Cong Phuoc Huynh
Fanjie Kong
Pichao Wang
Zhu Liu
Vimal Bhat
154
6
0
17 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
350
6
0
10 Jan 2025
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Xize Cheng
Siqi Zheng
Zehan Wang
Minghui Fang
Ziang Zhang
...
Tianhao Shen
Shengpeng Ji
Jialong Zuo
Tao Jin
Zhou Zhao
165
8
0
28 Oct 2024
OpenSep: Leveraging Large Language Models with Textual Inversion for
  Open World Audio Separation
OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio SeparationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tanvir Mahmud
Diana Marculescu
VLM
161
3
0
28 Sep 2024
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Leveraging Audio-Only Data for Text-Queried Target Sound ExtractionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Kohei Saijo
Janek Ebbers
François Germain
Sameer Khurana
Gordon Wichern
Jonathan Le Roux
238
4
0
20 Sep 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Language-Queried Target Sound Extraction Without Parallel Training DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Xuelong Li
VLM
378
5
0
14 Sep 2024
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Zehan Wang
Ziang Zhang
Hang Zhang
Luping Liu
Rongjie Huang
Xize Cheng
Hengshuang Zhao
Zhou Zhao
245
22
0
16 Jul 2024
A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining
A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Xubo Liu
Wenbo Wang
Shuhan Qi
Kejia Zhang
Jianyuan Sun
Wenwu Wang
248
10
0
06 Jul 2024
Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Weakly-supervised Audio Separation via Bi-modal Semantic SimilarityInternational Conference on Learning Representations (ICLR), 2024
Tanvir Mahmud
Saeed Amizadeh
K. Koishida
Diana Marculescu
AI4TS
193
4
0
02 Apr 2024
Cacophony: An Improved Contrastive Audio-Text Model
Cacophony: An Improved Contrastive Audio-Text ModelIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Ge Zhu
Jordan Darefsky
Zhiyao Duan
AuLLM
236
21
0
10 Feb 2024
Online Similarity-and-Independence-Aware Beamformer for Low-latency
  Target Sound Extraction
Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction
Atsuo Hiroe
118
0
0
27 Dec 2023
Can CLIP Help Sound Source Localization?
Can CLIP Help Sound Source Localization?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sooyoung Park
Arda Senocak
Joon Son Chung
145
15
0
07 Nov 2023
GASS: Generalizing Audio Source Separation with Large-scale Data
GASS: Generalizing Audio Source Separation with Large-scale DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jordi Pons
Xiaoyu Liu
Santiago Pascual
Joan Serrà
157
20
0
29 Sep 2023
Separate Anything You Describe
Separate Anything You DescribeIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
249
69
0
09 Aug 2023
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion
  Models
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion ModelsAsian Conference on Computer Vision (ACCV), 2023
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
146
4
0
31 Jul 2023
Complete and separate: Conditional separation with missing target source
  attribute completion
Complete and separate: Conditional separation with missing target source attribute completionIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
185
0
0
27 Jul 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained
  Language-Vision Models
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision ModelsIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serrà
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
160
24
0
16 Jun 2023
CAPTDURE: Captioned Sound Dataset of Single Sources
CAPTDURE: Captioned Sound Dataset of Single SourcesInterspeech (Interspeech), 2023
Yuki Okamoto
Kanta Shimonishi
Keisuke Imoto
Kota Dohi
Shota Horiguchi
Yohei Kawaguchi
138
1
0
28 May 2023
1