Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05284
Cited By
Simple and Controllable Music Generation
8 June 2023
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
MGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Simple and Controllable Music Generation"
50 / 255 papers shown
Title
Fast Text-to-Audio Generation with Adversarial Post-Training
Zachary Novack
Zach Evans
Zack Zukowski
Josiah Taylor
CJ Carr
...
Adnan Al-Sinan
Gian Marco Iodice
Julian McAuley
Taylor Berg-Kirkpatrick
Jordi Pons
30
0
0
13 May 2025
Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation
Jincheng Zhang
Gyorgy Fazekas
C. Saitis
53
0
0
06 May 2025
SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation
Yu-Ren Guo
Wen-Kai Tai
57
0
0
06 May 2025
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li
Wei Liu
Dongdong Lin
44
0
0
29 Apr 2025
DOSE : Drum One-Shot Extraction from Music Mixture
Suntae Hwang
Seonghyeon Kang
Kyungsu Kim
Semin Ahn
K. Lee
36
0
0
25 Apr 2025
SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
Yue Li
Weizhi Liu
Dongdong Lin
27
0
0
21 Apr 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
58
0
0
21 Apr 2025
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Yatong Bai
Jonah Casebeer
Somayeh Sojoudi
Nicholas J. Bryan
DiffM
VLM
48
1
0
21 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
69
0
0
17 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
Helen Meng
117
1
0
14 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
52
2
0
11 Apr 2025
ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
Seonghwan Park
Jaehyeon Jeong
Yongjun Kim
Jaeho Lee
Namhoon Lee
VLM
49
0
0
09 Apr 2025
STAGE: Stemmed Accompaniment Generation through Prefix-Based Conditioning
Giorgio Strano
Chiara Ballanti
Donato Crisostomi
Michele Mancusi
Luca Cosmo
Emanuele Rodolà
26
0
0
08 Apr 2025
Activation Patching for Interpretable Steering in Music Generation
Simone Facchiano
Giorgio Strano
Donato Crisostomi
Irene Tallini
Tommaso Mencattini
Fabio Galasso
Emanuele Rodolà
LLMSV
24
0
0
06 Apr 2025
LoopGen: Training-Free Loopable Music Generation
Davide Marincione
Giorgio Strano
Donato Crisostomi
Roberto Ribuoli
Emanuele Rodolà
MGen
60
0
0
06 Apr 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin
Jeongsoo Choi
Puyuan Peng
Joon Son Chung
Tae-Hyun Oh
David Harwath
VGen
45
1
0
03 Apr 2025
Watermarking for AI Content Detection: A Review on Text, Visual, and Audio Modalities
Lele Cao
41
0
0
02 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
31
0
0
28 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
89
0
0
26 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
64
2
0
25 Mar 2025
Human Motion Unlearning
Edoardo De Matteis
Matteo Migliarini
Alessio Sampieri
Indro Spinelli
Fabio Galasso
MU
57
0
0
24 Mar 2025
Aligning Text-to-Music Evaluation with Human Preferences
Yichen Huang
Zachary Novack
Koichi Saito
Jiatong Shi
Shinji Watanabe
Yuki Mitsufuji
John Thickstun
Chris Donahue
EGVM
70
1
0
20 Mar 2025
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
57
0
0
19 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang
Long Mai
Aniruddha Mahapatra
David Bourgin
Yicong Hong
Jonah Casebeer
Feng Liu
Y. Fu
DiffM
VGen
51
0
0
11 Mar 2025
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie
Qile He
Youjia Zhu
Qiwei He
Mengtian Li
VGen
95
2
0
11 Mar 2025
ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
DiffM
VGen
60
0
0
10 Mar 2025
A Multimodal Symphony: Integrating Taste and Sound through Generative AI
Matteo Spanio
Massimiliano Zampini
Antonio Rodà
Franco Pierucci
39
0
0
04 Mar 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
C. Zhang
Yukun Ma
Qian Chen
Wen Wang
Shengkui Zhao
...
Y. Jiang
Chaohong Tan
Zhifu Gao
Zhihao Du
B. Ma
50
0
0
28 Feb 2025
DGFM: Full Body Dance Generation Driven by Music Foundation Models
Xinran Liu
Zhenhua Feng
Diptesh Kanojia
Wenwu Wang
DiffM
66
1
0
27 Feb 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Weihao Wu
Zhiwei Lin
Yixuan Zhou
Jingbei Li
Rui Niu
Qinghua Wu
Songjun Cao
Long Ma
Zhiyong Wu
DiffM
44
0
0
27 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
76
0
0
26 Feb 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Ziqiang Liu
Shuangrui Ding
Zhixiong Zhang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Y. Cao
Dahua Lin
Jiaqi Wang
76
0
0
18 Feb 2025
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
Kyungsu Kim
Junghyun Koo
Sungho Lee
Haesun Joung
Kyogu Lee
58
0
0
13 Feb 2025
Hookpad Aria: A Copilot for Songwriters
Chris Donahue
Shih-Lun Wu
Yewon Kim
Dave Carlton
Ryan Miyakawa
John Thickstun
53
1
0
12 Feb 2025
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
Yong-Jin Liu
Lie Lu
Jihui Jin
Lichao Sun
Andrea Fanelli
98
1
0
06 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
57
1
0
05 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
171
0
0
05 Feb 2025
ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling
Yi-Chiao Wu
Dejan Marković
Steven Krenn
I. D. Gebru
Alexander Richard
61
0
0
04 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
2
0
28 Jan 2025
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Siyuan Hou
Shansong Liu
Ruibin Yuan
Wei Xue
Ying Shan
Mangsuo Zhao
Chao Zhang
87
3
0
17 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
74
7
0
10 Jan 2025
Generative AI for Cel-Animation: A Survey
Yunlong Tang
Junjia Guo
Pinxin Liu
Zhiyuan Wang
Hang Hua
...
Jing Bi
Mingqian Feng
Xuzhao Li
Zeliang Zhang
Chenliang Xu
VGen
88
7
0
08 Jan 2025
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park
Sebin Kim
Taehong Moon
Minkyu Kim
Kangwook Lee
Jaewoong Cho
DiffM
CoGe
64
2
0
08 Jan 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
58
10
0
08 Jan 2025
Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer
Michele Mancusi
Yurii Halychanskyi
K. Cheuk
Eloi Moliner
Chieh-Hsin Lai
...
Junghyun Koo
Marco A. Martínez-Ramírez
Wei-Hsiang Liao
Giorgio Fabbro
Yuki Mitsufuji
DiffM
85
2
0
08 Jan 2025
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
Tornike Karchkhadze
M. Izadi
Shlomo Dubnov
DiffM
41
2
0
31 Dec 2024
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham
Hoang-Nam Le
Phu-Vinh Nguyen
Chris Ngo
Truong Son-Hy
AuLLM
LRM
81
1
0
21 Dec 2024
1
2
3
4
5
6
Next