Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22053
Cited By
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
28 May 2025
Yan Rong
Jinting Wang
Shan Yang
Guangzhi Lei
Li Liu
DiffM
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation"
12 / 12 papers shown
Title
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
91
2
0
15 Apr 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffM
VGen
106
1
0
13 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Ziqiang Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
155
17
0
04 Mar 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
422
699
0
20 Feb 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Ziqiang Liu
Shuangrui Ding
Zhixiong Zhang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
132
3
0
18 Feb 2025
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
Andros Tjandra
Yi-Chiao Wu
Baishan Guo
John Hoffman
Brian Ellis
...
Matt Le
Nick Zacharov
Carleigh Wood
Ann Lee
Wei-Ning Hsu
228
18
0
07 Feb 2025
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
H. Zuo
W. You
Junxian Wu
Shihong Ren
Pei Chen
Mingxu Zhou
Yaojie Lu
Lingyun Sun
VGen
73
5
0
20 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
288
18
0
19 Dec 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
173
5
0
23 Nov 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
138
92
0
09 Oct 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
99
9
0
13 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
90
28
0
05 Sep 2024
1