Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

23 October 2019

Papers citing "Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis"

26 / 26 papers shown

Title
Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks Chaoyi Wang Junjie Zheng Zihao Chen Shiyu Xia Chaofan Ding Xiaohao Zhang Xi Tao Xiaoming He Xinhan Di AuLLM 120 0 0 30 Apr 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing Gaoxiang Cong Jiadong Pan Liang-Sheng Li Yuankai Qi Yuxin Peng A. Hengel Jian Yang Qingming Huang 92 6 0 12 Dec 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset Anton Firc K. Malinka P. Hanáček DiffM 36 0 0 09 Oct 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing Philip Anastassiou Zhenyu Tang Kainan Peng Dongya Jia Jiaxin Li Ming Tu Yuping Wang Yuxuan Wang Mingbo Ma 42 4 0 10 Apr 2024
Learning to Dub Movies via Hierarchical Prosody Models Gaoxiang Cong Liang Li Yuankai Qi Zhengjun Zha Qi Wu Wen-yu Wang Bin Jiang Ming Yang Qin Huang 72 25 0 08 Dec 2022
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models Perry Lam Huayun Zhang Nancy F. Chen Berrak Sisman 11 2 0 22 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS Liumeng Xue Frank Soong Shaofei Zhang Linfu Xie 19 23 0 14 Sep 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS Kyle Kastner Aaron Courville 32 0 0 30 Jun 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 20 73 0 17 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 19 11 0 23 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 23 0 25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 11 11 0 19 Nov 2021
Integrated Speech and Gesture Synthesis Siyang Wang Simon Alexanderson Joakim Gustafson Jonas Beskow G. Henter Éva Székely 34 19 0 25 Aug 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Ye Jia Michelle Tadmor Ramanovich Tal Remez Roi Pomerantz 26 67 0 19 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Dan Su 18 30 0 21 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 21 24 0 20 Apr 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 34 22 0 12 Feb 2021
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 25 55 0 17 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 19 97 0 06 Nov 2020
PPG-based singing voice conversion with adversarial representation learning Zhonghao Li Benlai Tang Xiang Yin Yuan Wan Linjia Xu Chen Shen Zejun Ma 11 37 0 28 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines Yao Shi Hui Bu Xin Xu Shaojing Zhang Ming Li 16 218 0 22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 17 102 0 22 Oct 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis Fengyu Yang Shan Yang Qinghua Wu Yujun Wang Lei Xie 11 5 0 03 Aug 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 31 473 0 22 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation A. Laptev Roman Korostik A. Svischev A. Andrusenko Ivan Medennikov S. Rybin 16 61 0 14 May 2020