ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05284
  4. Cited By
Simple and Controllable Music Generation

Simple and Controllable Music Generation

8 June 2023
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
    MGen
ArXivPDFHTML

Papers citing "Simple and Controllable Music Generation"

50 / 256 papers shown
Title
Can Synthetic Audio From Generative Foundation Models Assist Audio
  Recognition and Speech Modeling?
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian Feng
Dimitrios Dimitriadis
Shrikanth Narayanan
29
4
0
13 Jun 2024
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion
  Models
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models
J. Nistal
Marco Pasini
Cyran Aouameur
M. Grachten
Stefan Lattner
DiffM
47
16
0
12 Jun 2024
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Yi Lu
Yuankun Xie
Ruibo Fu
Zhengqi Wen
Jianhua Tao
...
Xuefei Liu
Yongwei Li
Yukun Liu
Xiaopeng Wang
Shuchen Shi
48
1
0
12 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
  Monotonic Alignment
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
41
14
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
45
15
0
11 Jun 2024
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from
  Codec-Based Speech Synthesis Systems
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
Haibin Wu
Yuan Tseng
Hung-yi Lee
AuLLM
30
6
0
11 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
47
15
0
08 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
46
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
39
3
0
06 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
37
6
0
06 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
102
16
0
06 Jun 2024
Text-to-Events: Synthetic Event Camera Streams from Conditional Text
  Input
Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input
Joachim Ott
Zuowen Wang
Shih-Chii Liu
DiffM
VGen
50
0
0
05 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
38
3
0
05 Jun 2024
An Independence-promoting Loss for Music Generation with Language Models
An Independence-promoting Loss for Music Generation with Language Models
Jean-Marie Lemercier
Simon Rouard
Jade Copet
Yossi Adi
Alexandre Défossez
30
1
0
04 Jun 2024
MidiCaps: A large-scale MIDI dataset with text captions
MidiCaps: A large-scale MIDI dataset with text captions
J. Melechovský
Abhinaba Roy
Dorien Herremans
24
9
0
04 Jun 2024
MaskSR: Masked Language Model for Full-band Speech Restoration
MaskSR: Masked Language Model for Full-band Speech Restoration
Xu Li
Qirui Wang
Xiaoyu Liu
47
8
0
04 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
54
0
0
31 May 2024
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music
  Generation
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
30
8
0
30 May 2024
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language
  Models via Instruction Tuning
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Yixiao Zhang
Yukara Ikemiya
Woosung Choi
Naoki Murata
Marco A. Martínez-Ramírez
Liwei Lin
Gus Xia
Wei-Hsiang Liao
Yuki Mitsufuji
Simon Dixon
57
10
0
28 May 2024
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
41
0
0
24 May 2024
DAC-JAX: A JAX Implementation of the Descript Audio Codec
DAC-JAX: A JAX Implementation of the Descript Audio Codec
David Braun
29
0
0
19 May 2024
Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded
  Diffusion Models
Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models
Ziyu Wang
Lejun Min
Gus Xia
DiffM
21
10
0
16 May 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment
  Generation
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
42
2
0
13 May 2024
Correlation Dimension of Natural Language in a Statistical Manifold
Correlation Dimension of Natural Language in a Statistical Manifold
Xin Du
Kumiko Tanaka-Ishii
19
1
0
10 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
15
0
08 May 2024
Detecting music deepfakes is easy but actually hard
Detecting music deepfakes is easy but actually hard
Darius Afchar
Gabriel Meseguer-Brocal
Romain Hennequin
63
6
0
07 May 2024
Creative Problem Solving in Large Language and Vision Models -- What
  Would it Take?
Creative Problem Solving in Large Language and Vision Models -- What Would it Take?
Lakshmi Nair
Evana Gizzi
Jivko Sinapov
MLLM
55
2
0
02 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
35
18
0
30 Apr 2024
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Qixin Deng
Qikai Yang
Ruibin Yuan
Yipeng Huang
Yi Wang
...
Emmanouil Benetos
Wenwu Wang
Guangyu Xia
Wei Xue
Yi-Ting Guo
LLMAG
36
28
0
28 Apr 2024
Semantically consistent Video-to-Audio Generation using Multimodal
  Language Large Model
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen
Guan’an Wang
Xiaowen Huang
Jitao Sang
VGen
22
8
0
25 Apr 2024
Long-form music generation with latent diffusion
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGen
DiffM
44
39
0
16 Apr 2024
MuPT: A Generative Symbolic Music Pretrained Transformer
MuPT: A Generative Symbolic Music Pretrained Transformer
Xingwei Qu
Yuelin Bai
Yi Ma
Ziya Zhou
Ka Man Lo
...
Xu Tan
Stephen W. Huang
Wenhu Chen
Jie Fu
Ge Zhang
57
10
0
09 Apr 2024
Gull: A Generative Multifunctional Audio Codec
Gull: A Generative Multifunctional Audio Codec
Yi Luo
Jianwei Yu
Hangting Chen
Rongzhi Gu
Chao Weng
AuLLM
38
3
0
07 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
74
39
0
03 Apr 2024
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and
  Action Recognition in Drone Imagery
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery
Christian Limberg
Artur Gonçalves
Bastien Rigault
Helmut Prendinger
32
5
0
02 Apr 2024
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
Junghyun Koo
G. Wichern
François Germain
Sameer Khurana
Jonathan Le Roux
31
3
0
02 Apr 2024
Synthetic training set generation using text-to-audio models for
  environmental sound classification
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
37
2
0
26 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
74
57
0
25 Mar 2024
Generalized Multi-Source Inference for Text Conditioned Music Diffusion
  Models
Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Emilian Postolache
Giorgio Mariani
Luca Cosmo
Emmanouil Benetos
Emanuele Rodolà
DiffM
37
9
0
18 Mar 2024
Symbiotic Game and Foundation Models for Cyber Deception Operations in
  Strategic Cyber Warfare
Symbiotic Game and Foundation Models for Cyber Deception Operations in Strategic Cyber Warfare
Tao Li
Quanyan Zhu
AAML
34
5
0
14 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video
  Production
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAG
VGen
36
7
0
12 Mar 2024
MAP-Elites with Transverse Assessment for Multimodal Problems in
  Creative Domains
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
29
1
0
11 Mar 2024
Beyond Language Models: Byte Models are Digital World Simulators
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
33
12
0
29 Feb 2024
SongComposer: A Large Language Model for Lyric and Melody Composition in
  Song Generation
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Shuangrui Ding
Zihan Liu
Xiao-wen Dong
Pan Zhang
Rui Qian
Conghui He
Dahua Lin
Jiaqi Wang
22
23
0
27 Feb 2024
D-Flow: Differentiating through Flows for Controlled Generation
D-Flow: Differentiating through Flows for Controlled Generation
Heli Ben-Hamu
Omri Puny
Itai Gat
Brian Karrer
Uriel Singer
Y. Lipman
41
24
0
21 Feb 2024
Towards audio language modeling -- an overview
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
30
28
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
27
114
0
19 Feb 2024
MuChin: A Chinese Colloquial Description Benchmark for Evaluating
  Language Models in the Field of Music
MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music
Zihao Wang
Shuyu Li
Tao Zhang
Qi Wang
Pengfei Yu
Jinyang Luo
Yan Liu
Ming Xi
Kejun Zhang
40
4
0
15 Feb 2024
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation
  and Editing via Content-based Controls
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls
Liwei Lin
Gus Xia
Yixiao Zhang
Junyan Jiang
19
12
0
14 Feb 2024
Evaluating Co-Creativity using Total Information Flow
Evaluating Co-Creativity using Total Information Flow
V. Gokul
Chris Francis
Shlomo Dubnov
19
0
0
09 Feb 2024
Previous
123456
Next