Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2410.14101
Cited By
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
18 October 2024
Shuwei He
Rui Liu
Hong Li
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech"
7 / 7 papers shown
Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
Xinlei Niu
Jianbo Ma
Dylan Harper-Harris
Xiangyu Zhang
Charles Patrick Martin
Jing Zhang
DiffM
VGen
146
0
0
19 Sep 2025
MEAN-RIR: Multi-Modal Environment-Aware Network for Robust Room Impulse Response Estimation
Jiajian Chen
Jiakang Chen
Hang Chen
Qing Wang
Yu Gao
Jun Du
136
2
0
05 Sep 2025
VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge
Zijing Zhao
Kai Wang
Hao-Ming Huang
Ying Hu
Liang He
J. Yang
205
0
0
19 Jun 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
481
2
0
08 May 2025
HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation
Xiao Zhang
Shaoxuan Wu
Peilin Zhang
Zhuo Jin
Xiaosong Xiong
Qirong Bu
Jingkun Chen
Jun Feng
310
9
0
25 Dec 2024
Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Zhenqi Jia
Rui Liu
215
6
0
25 Dec 2024
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
AAAI Conference on Artificial Intelligence (AAAI), 2024
Rui Liu
Shuwei He
Yifan Hu
Hong Li
VLM
513
8
0
16 Dec 2024
1
Page 1 of 1