Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.21448
Cited By
v1
v2
v3 (latest)
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
26 June 2025
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
VGen
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (6 upvotes)
Github (3487★)
Papers citing
"ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing"
11 / 11 papers shown
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Mengchen Zhang
Qi Chen
Tong Wu
Zihan Liu
Dahua Lin
VGen
169
0
0
02 Dec 2025
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
Shuhan Xia
Peipei Li
Xuannan Liu
Dongsen Zhang
Xinyu Guo
Zekun Li
AAML
223
0
0
26 Nov 2025
3MDiT: Unified Tri-Modal Diffusion Transformer for Text-Driven Synchronized Audio-Video Generation
Y. Li
Heyu Si
Federico Landi
Pilar Oplustil Gallegos
Ioannis Koutsoumpas
...
Ruiju Fu
Qi Guo
Xin Jin
Shunyu Liu
Mingli Song
DiffM
VGen
193
0
0
26 Nov 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
204
0
0
11 Nov 2025
Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video
Ciara Rowles
Varun Jampani
Simon Donné
Shimon Vainer
Julian Parker
Zach Evans
VGen
156
1
0
24 Oct 2025
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Haomiao Chen
K. Jamison
M. Sabuncu
Amy Kuceyeski
151
1
0
07 Oct 2025
SoundReactor: Frame-level Online Video-to-Audio Generation
Koichi Saito
Julian Tanke
Christian Simon
Masato Ishii
Kazuki Shimada
Zachary Novack
Zhi-Wei Zhong
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
DiffM
VGen
241
0
0
02 Oct 2025
StereoFoley: Object-Aware Stereo Audio Generation from Video
Tornike Karchkhadze
Kuan-Lin Chen
Mojtaba
Heydari
Robert Henzel
Alessandro Toso
Mehrez Souden
DiffM
VGen
AuLLM
248
1
0
22 Sep 2025
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models
Qiaolin Wang
Xilin Jiang
Linyang He
Junkai Wu
Nima Mesgarani
AuLLM
LRM
VLM
168
0
0
19 Sep 2025
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
Sizhe Shan
Qiulin Li
Yutao Cui
Miles Yang
Y. Wang
Qun Yang
Jin Zhou
Zhao Zhong
DiffM
VGen
85
16
0
23 Aug 2025
FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
Lei Zhao
Rujin Chen
Chi Zhang
Xiao-Lei Zhang
Xuelong Li
159
1
0
18 Aug 2025
1