ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.21448
  4. Cited By
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
v1v2v3 (latest)

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

26 June 2025
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
    VGenLRM
ArXiv (abs)PDFHTMLHuggingFace (6 upvotes)Github (3487★)

Papers citing "ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing"

11 / 11 papers shown
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Mengchen Zhang
Qi Chen
Tong Wu
Zihan Liu
Dahua Lin
VGen
169
0
0
02 Dec 2025
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
Shuhan Xia
Peipei Li
Xuannan Liu
Dongsen Zhang
Xinyu Guo
Zekun Li
AAML
223
0
0
26 Nov 2025
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Y. Li
Heyu Si
Federico Landi
Pilar Oplustil Gallegos
Ioannis Koutsoumpas
...
Ruiju Fu
Qi Guo
Xin Jin
Shunyu Liu
Mingli Song
DiffMVGen
193
0
0
26 Nov 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
204
0
0
11 Nov 2025
Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video
Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video
Ciara Rowles
Varun Jampani
Simon Donné
Shimon Vainer
Julian Parker
Zach Evans
VGen
156
1
0
24 Oct 2025
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Haomiao Chen
K. Jamison
M. Sabuncu
Amy Kuceyeski
151
1
0
07 Oct 2025
SoundReactor: Frame-level Online Video-to-Audio Generation
SoundReactor: Frame-level Online Video-to-Audio Generation
Koichi Saito
Julian Tanke
Christian Simon
Masato Ishii
Kazuki Shimada
Zachary Novack
Zhi-Wei Zhong
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
DiffMVGen
241
0
0
02 Oct 2025
StereoFoley: Object-Aware Stereo Audio Generation from Video
StereoFoley: Object-Aware Stereo Audio Generation from Video
Tornike Karchkhadze
Kuan-Lin Chen
Mojtaba
Heydari
Robert Henzel
Alessandro Toso
Mehrez Souden
DiffMVGenAuLLM
248
1
0
22 Sep 2025
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models
Qiaolin Wang
Xilin Jiang
Linyang He
Junkai Wu
Nima Mesgarani
AuLLMLRMVLM
168
0
0
19 Sep 2025
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
Sizhe Shan
Qiulin Li
Yutao Cui
Miles Yang
Y. Wang
Qun Yang
Jin Zhou
Zhao Zhong
DiffMVGen
85
16
0
23 Aug 2025
FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
Lei Zhao
Rujin Chen
Chi Zhang
Xiao-Lei Zhang
Xuelong Li
159
1
0
18 Aug 2025
1