ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.00830
  4. Cited By
AUDIT: Audio Editing by Following Instructions with Latent Diffusion
  Models
v1v2 (latest)

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

Neural Information Processing Systems (NeurIPS), 2023
3 April 2023
Yuancheng Wang
Zeqian Ju
Xuejiao Tan
Lei He
Zhizheng Wu
Jiang Bian
Sheng Zhao
    DiffM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models"

43 / 43 papers shown
Title
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
Ali Boudaghi
Hadi Zare
272
0
0
06 Nov 2025
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
Michael Ungersböck
Florian Grötschla
Luca A. Lanzendörfer
June Young Yi
Changho Choi
Roger Wattenhofer
AuLLM
133
1
0
26 Oct 2025
3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation
3DiFACE: Synthesizing and Editing Holistic 3D Facial AnimationInternational Conference on 3D Vision (3DV), 2025
Balamurugan Thambiraja
Malte Prinzler
S. Aliakbarian
Darren Cosker
Justus Thies
DiffMVGen
124
1
0
30 Sep 2025
Guiding Audio Editing with Audio Language Model
Guiding Audio Editing with Audio Language Model
Zitong Lan
Yiduo Hao
Mingmin Zhao
DiffMKELM
142
4
0
25 Sep 2025
Audio Super-Resolution with Latent Bridge Models
Audio Super-Resolution with Latent Bridge Models
Chang Li
Zehua Chen
Liyuan Wang
Jun Zhu
284
3
0
22 Sep 2025
FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
Zeyu Xie
Yaoyun Zhang
Xuenan Xu
Yongkang Yin
Chenxing Li
Mengyue Wu
Yuexian Zou
147
0
0
21 Sep 2025
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
Yuhang Jia
Xu Zhang
Yang Chen
Hui Wang
Enzhi Wang
Yong Qin
LRM
88
0
0
21 Sep 2025
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
Liting Gao
Yi Yuan
Yaru Chen
Yuelan Cheng
Zhenbo Li
Juan Wen
Shubin Zhang
Wenwu Wang
DiffM
104
1
0
17 Sep 2025
DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners
DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners
Xiaoxue Luo
Jinwei Huang
Runyan Yang
Yingying Gao
Junlan Feng
Chao Deng
Shilei Zhang
122
2
0
11 Sep 2025
Recomposer: Event-roll-guided generative audio editing
Recomposer: Event-roll-guided generative audio editing
D. Ellis
Eduardo Fonseca
Ron J. Weiss
K. Wilson
Scott Wisdom
Hakan Erdogan
J. Hershey
A. Jansen
R. C. Moore
Manoj Plakal
KELM
78
1
0
05 Sep 2025
WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration
WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration
Kevin Putra Santoso
Rizka Wakhidatus Sholikah
Raden Venantius Hari Ginardi
139
0
0
28 Aug 2025
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Hyeonyu Kim
Seokhoon Jeong
Seonghee Han
Chanhyuk Choi
Taehwan Kim
DiffM
77
0
0
28 Aug 2025
ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
Han Yin
Yang Xiao
Rohan Kumar Das
Jisheng Bai
Ting Dang
74
4
0
06 Aug 2025
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
J. Melechovský
Ambuj Mehrish
Abhinaba Roy
Dorien Herremans
149
2
0
05 Aug 2025
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
Yuhang Jia
Xu Zhang
Yong Qin
Yang Chen
Shiwan Zhao
VLM
159
0
0
03 Aug 2025
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
Akio Hayakawa
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
DiffMVGen
229
1
0
26 Jun 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
VGenLRM
353
13
0
26 Jun 2025
Abstract Sound Fusion with Unconditional Inversion Models
Abstract Sound Fusion with Unconditional Inversion Models
Jing Liu
EnQi Lian
Moyao Deng
278
0
0
13 Jun 2025
BNMusic: Blending Environmental Noises into Personalized Music
BNMusic: Blending Environmental Noises into Personalized Music
Chi Zuo
M. B. Møller
Pablo Martínez-Nuevo
Huayang Huang
Yu Wu
Ye Zhu
305
0
0
12 Jun 2025
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
Geonyoung Lee
Geonhee Han
Paul Hongsuck Seo
DiffM
221
1
0
03 Jun 2025
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Yunkee Chae
Kyogu Lee
329
0
0
29 May 2025
SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking
SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking
Lingfeng Yao
Chenpei Huang
Shengyao Wang
Junpei Xue
Hanqing Guo
Jiang Liu
Hang Zhang
Miao Pan
210
1
0
28 May 2025
Text-Queried Audio Source Separation via Hierarchical Modeling
Text-Queried Audio Source Separation via Hierarchical ModelingIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Xinlei Yin
Xiulian Peng
Xue Jiang
Zhiwei Xiong
Yan Lu
148
0
0
27 May 2025
Audio Texture Manipulation by Exemplar-Based Analogy
Audio Texture Manipulation by Exemplar-Based AnalogyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Kan Jen Cheng
Tingle Li
Gopala Anumanchipalli
DiffM
120
2
0
21 Jan 2025
Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning
Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Shengkui Zhao
Zexu Pan
Kun Zhou
Yukun Ma
Chuxu Zhang
B. Ma
DiffM
115
2
0
20 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
582
14
0
10 Jan 2025
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
FlowSep: Language-Queried Sound Separation with Rectified Flow MatchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yi Yuan
Xubo Liu
Haohe Liu
Mark D. Plumbley
Wenwu Wang
363
22
0
10 Jan 2025
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Yuning Han
Bingyin Zhao
Rui Chu
Feng Luo
Biplab Sikdar
Yingjie Lao
DiffMAAML
465
4
0
16 Dec 2024
The Evolution and Future Perspectives of Artificial Intelligence Generated Content
The Evolution and Future Perspectives of Artificial Intelligence Generated Content
Chengzhang Zhu
Luobin Cui
Ying Tang
Jiacun Wang
359
2
0
02 Dec 2024
Did You Hear That? Introducing AADG: A Framework for Generating
  Benchmark Data in Audio Anomaly Detection
Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection
Ksheeraja Raghavan
Samiran Gode
Ankit Parag Shah
Surabhi Raghavan
Wolfram Burgard
Bhiksha Raj
Rita Singh
209
0
0
04 Oct 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape StylizationEuropean Conference on Computer Vision (ECCV), 2024
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffMSSL
223
7
0
22 Sep 2024
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
AudioEditor: A Training-Free Diffusion-Based Audio Editing FrameworkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yuhang Jia
Yang Chen
Jinghua Zhao
Shiwan Zhao
Wenjia Zeng
Yong Chen
Yong Qin
DiffM
119
10
0
19 Sep 2024
Seed-Music: A Unified Framework for High Quality and Controlled Music
  Generation
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Ye Bai
Haonan Chen
Jitong Chen
Zhuo Chen
Yi Deng
...
Hang Zhao
Ziyi Zhao
Dejian Zhong
Shicen Zhou
Pei Zou
DiffM
278
17
0
13 Sep 2024
Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image
  Diffusion Models
Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Rohit Jena
Ali Taghibakhshi
Sahil Jain
Gerald Shen
Nima Tajbakhsh
Arash Vahdat
360
7
0
09 Sep 2024
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Yixiao Zhang
Yukara Ikemiya
Woosung Choi
Naoki Murata
Marco A. Martínez-Ramírez
Liwei Lin
Gus Xia
Wei-Hsiang Liao
Yuki Mitsufuji
Simon Dixon
354
22
0
28 May 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted
  Augmentations
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
218
2
0
17 May 2024
MusicHiFi: Fast High-Fidelity Stereo Vocoding
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Ge Zhu
Juan-Pablo Caceres
Zhiyao Duan
Nicholas J. Bryan
DiffM
238
8
0
15 Mar 2024
SingVisio: Visual Analytics of Diffusion Model for Singing Voice
  Conversion
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion
Liumeng Xue
Chaoren Wang
Mingxuan Wang
Xueyao Zhang
Jun Han
Zhizheng Wu
DiffM
162
6
0
20 Feb 2024
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory ExperienceIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
KELM
313
6
0
06 Feb 2024
Audio Editing with Non-Rigid Text Prompts
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Luca Della Libera
Zhepei Wang
Mirco Ravanelli
Paris Smaragdis
Cem Subakan
DiffM
172
11
0
19 Oct 2023
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative
  Editing
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Yixiao Zhang
Akira Maezawa
Gus Xia
Kazuhiko Yamamoto
Simon Dixon
141
21
0
19 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBMAuLLM
399
181
0
01 Oct 2023
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio InpaintingIEEE International Conference on Computer Vision (ICCV), 2019
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
286
91
0
24 Oct 2019
1