ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18474
  4. Cited By
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

29 May 2023
Jia-Bin Huang
Yi Ren
Rongjie Huang
Dongchao Yang
Zhenhui Ye
Chen Zhang
Jinglin Liu
Xiang Yin
Zejun Ma
Zhou Zhao
    DiffM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation"

50 / 61 papers shown
'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
Chihiro Nagashima
Akira Takahashi
Zhi-Wei Zhong
Shusuke Takahashi
Yuki Mitsufuji
84
0
0
29 Oct 2025
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
Hui Wang
J. Zhao
Cheng Liu
Yuhang Jia
Haoqin Sun
Jiaming Zhou
Yong Qin
145
1
0
16 Oct 2025
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
Chetwin Low
Weimin Wang
Calder Katyal
DiffMVGen
156
10
0
30 Sep 2025
Guiding Audio Editing with Audio Language Model
Guiding Audio Editing with Audio Language Model
Zitong Lan
Yiduo Hao
Mingmin Zhao
DiffMKELM
175
4
0
25 Sep 2025
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
Liting Gao
Yi Yuan
Yaru Chen
Yuelan Cheng
Zhenbo Li
Juan Wen
Shubin Zhang
Wenwu Wang
DiffM
159
1
0
17 Sep 2025
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
Yi Yuan
Xubo Liu
Haohe Liu
Xiyuan Kang
Zhuo Chen
Yuping Wang
Mark D. Plumbley
Wenwu Wang
DiffM
136
1
0
07 Sep 2025
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
Hui Wang
Cheng Liu
Junyang Chen
Haoze Liu
Yuhang Jia
Shiwan Zhao
Jiaming Zhou
Haoqin Sun
Hui Bu
Yong Qin
LM&MAELM
136
4
0
02 Sep 2025
PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
Zihao Zheng
Zeyu Xie
Xuenan Xu
Wen Wu
Chao Zhang
Mengyue Wu
139
0
0
31 Aug 2025
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
Xiquan Li
Junxi Liu
Yuzhe Liang
Zhikang Niu
Wenxi Chen
Xie Chen
259
2
0
08 Aug 2025
How Far Are We from Generating Missing Modalities with Foundation Models?
How Far Are We from Generating Missing Modalities with Foundation Models?
Guanzhou Ke
Yi Xie
Xiaoli Wang
Guoqing Chao
Bo Wang
VLM
307
0
0
04 Jun 2025
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
Kuan Po Huang
Shu-Wen Yang
Huy Phan
Bo-Ru Lu
Byeonggeun Kim
...
Qingming Tang
Shalini Ghosh
Hung-yi Lee
Chieh-Chi Kao
Chao Wang
190
2
0
31 May 2025
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
Junqi Zhao
Jinzheng Zhao
Haohe Liu
Yun Chen
Lu Han
Xubo Liu
Mark D. Plumbley
Wenwu Wang
DiffM
236
2
0
28 May 2025
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhi-Wei Zhong
Akira Takahashi
Shuyang Cui
Keisuke Toyama
Shusuke Takahashi
Yuki Mitsufuji
VGen
247
4
0
22 May 2025
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zehan Wang
Ke Lei
Chen Zhu
Jiawei Huang
Sashuai Zhou
...
Xize Cheng
Shengpeng Ji
Zhenhui Ye
Tao Jin
Zhou Zhao
237
3
0
15 May 2025
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Riccardo Passoni
Francesca Ronchini
Luca Comanducci
Romain Serizel
Fabio Antonacci
DiffM
419
1
0
12 May 2025
Policy Optimization Algorithms in a Unified Framework
Policy Optimization Algorithms in a Unified Framework
Shuang Wu
245
1
0
04 Apr 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Haomin Zhang
Siyang Song
Haoyu Wang
Zihao Chen
Xianglong Liu
Chaofan Ding
Xinhan Di
228
0
0
28 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum DecompositionComputer Vision and Pattern Recognition (CVPR), 2025
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffMVGen
268
4
0
10 Mar 2025
ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation
ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
VGenDiffM
513
0
0
10 Mar 2025
A Multimodal Symphony: Integrating Taste and Sound through Generative AI
Matteo Spanio
Massimiliano Zampini
Antonio Rodà
Franco Pierucci
207
1
0
04 Mar 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Ju Liu
Xuelong Li
Fangqiu Yi
Xuelong Li
DiffMMDE
417
4
0
26 Feb 2025
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
Yi Yuan
Dongya Jia
Xiaobin Zhuang
Yuanzhe Chen
Zhengxi Liu
...
Longji Xu
Xubo Liu
Xiyuan Kang
Mark D. Plumbley
Wenwu Wang
VLM
377
4
0
03 Jan 2025
LoVA: Long-form Video-to-Audio Generation
LoVA: Long-form Video-to-Audio GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGenDiffM
265
15
0
31 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio SynthesisComputer Vision and Pattern Recognition (CVPR), 2024
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
544
70
0
19 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from
  Text
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
347
12
0
03 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified FlowsComputer Vision and Pattern Recognition (CVPR), 2024
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
458
26
0
02 Dec 2024
Scaling Concept With Text-Guided Diffusion Models
Scaling Concept With Text-Guided Diffusion Models
Chao Huang
Susan Liang
Yunlong Tang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
182
10
0
31 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent ApproachNeural Information Processing Systems (NeurIPS), 2024
Rory Young
Nicolas Pugeault
AAML
369
22
0
14 Oct 2024
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Language-Guided Joint Audio-Visual Editing via One-Shot AdaptationAsian Conference on Computer Vision (ACCV), 2024
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
296
18
0
09 Oct 2024
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
SRC-gAudio: Sampling-Rate-Controlled Audio GenerationAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2024
Chenxing Li
Manjie Xu
Dong Yu
DiffM
138
1
0
09 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffMVGenLLMAG
309
11
0
04 Oct 2024
MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of
  Anomalous Sound Detection System
MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of Anomalous Sound Detection System
Harsh Purohit
Tomoya Nishida
Kota Dohi
Takashi Endo
Yohei Kawaguchi
DiffM
142
4
0
27 Sep 2024
Video-to-Audio Generation with Fine-grained Temporal Semantics
Video-to-Audio Generation with Fine-grained Temporal Semantics
Yuchen Hu
Yu Gu
Chenxing Li
Rilin Chen
Dong Yu
VGenDiffM
262
4
0
23 Sep 2024
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
AudioEditor: A Training-Free Diffusion-Based Audio Editing FrameworkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yuhang Jia
Yang Chen
Jinghua Zhao
Shiwan Zhao
Wenjia Zeng
Yong Chen
Yong Qin
DiffM
154
10
0
19 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
AudioComposer: Towards Fine-grained Audio Generation with Natural Language DescriptionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yun Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
385
8
0
19 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
319
17
0
01 Sep 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
156
2
0
19 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffMVGen
284
24
0
10 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGenDiffM
331
36
0
08 Jul 2024
PAGURI: a user experience study of creative interaction with text-to-music models
PAGURI: a user experience study of creative interaction with text-to-music models
Francesca Ronchini
Luca Comanducci
Gabriele Perego
Fabio Antonacci
449
7
0
05 Jul 2024
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
  Audio Events in Text-to-audio Generation
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Zeyu Xie
Xuenan Xu
Zhizheng Wu
Mengyue Wu
278
15
0
03 Jul 2024
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
Zeyu Xie
Xuenan Xu
Zhizheng Wu
Mengyue Wu
AuLLM
264
14
0
03 Jul 2024
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for
  Efficient Audio Synthesis and Beyond
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Marco Comunità
Zhi-Wei Zhong
Akira Takahashi
Shiqi Yang
Mengjie Zhao
Koichi Saito
Yukara Ikemiya
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
297
15
0
25 Jun 2024
FakeSound: Deepfake General Audio Detection
FakeSound: Deepfake General Audio Detection
Zeyu Xie
Baihan Li
Xuenan Xu
Zheng Liang
Kai Yu
Mengyue Wu
150
8
0
12 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Ping Luo
Hongsheng Li
Peng Gao
305
107
0
05 Jun 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jia-Bin Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
VGenDiffM
539
4
0
01 Jun 2024
Creative Text-to-Audio Generation via Synthesizer Programming
Creative Text-to-Audio Generation via Synthesizer Programming
Manuel Cherep
Nikhil Singh
Jessica Shand
192
9
0
01 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
306
9
0
31 May 2024
SoundLoCD: An Efficient Conditional Discrete Contrastive Latent
  Diffusion Model for Text-to-Sound Generation
SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation
Xinlei Niu
Jing Zhang
Christian J. Walder
Charles Patrick Martin
215
3
0
24 May 2024
Prompt-guided Precise Audio Editing with Diffusion Models
Prompt-guided Precise Audio Editing with Diffusion Models
Manjie Xu
Chenxing Li
Duzhen Zhang
Dan Su
Weihan Liang
Dong Yu
DiffM
179
14
0
11 May 2024
12
Next