ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.22053
  4. Cited By

AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

28 May 2025
Yan Rong
Jinting Wang
Shan Yang
Guangzhi Lei
Li Liu
    DiffMVGen
ArXiv (abs)PDFHTML

Papers citing "AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation"

12 / 12 papers shown
Title
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
91
2
0
15 Apr 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffMVGen
106
1
0
13 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Ziqiang Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLMOffRLLRM
155
17
0
04 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
422
699
0
20 Feb 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Ziqiang Liu
Shuangrui Ding
Zhixiong Zhang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
132
3
0
18 Feb 2025
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
Andros Tjandra
Yi-Chiao Wu
Baishan Guo
John Hoffman
Brian Ellis
...
Matt Le
Nick Zacharov
Carleigh Wood
Ann Lee
Wei-Ning Hsu
228
18
0
07 Feb 2025
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
H. Zuo
W. You
Junxian Wu
Shihong Ren
Pei Chen
Mingxu Zhou
Yaojie Lu
Lingyun Sun
VGen
73
5
0
20 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
288
18
0
19 Dec 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
173
5
0
23 Nov 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
138
92
0
09 Oct 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGenDiffM
99
9
0
13 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
90
28
0
05 Sep 2024
1