v1v2v3 (latest)

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

3 June 2019

Papers citing "Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS"

33 / 33 papers shown

Title
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 62 2 0 28 Dec 2022
OverFlow: Putting flows on top of neural transducers for better TTS Shivam Mehta Ambika Kirkland Harm Lameris Jonas Beskow Éva Székely G. Henter AI4TS 107 13 0 13 Nov 2022
Structured State Space Decoder for Speech Recognition and Synthesis Koichi Miyazaki Masato Murata Tomoki Koriyama 104 13 0 31 Oct 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS Haohan Guo Fenglong Xie Frank Soong Xixin Wu Helen M. Meng 78 12 0 22 Sep 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention Artem Gorodetskii Ivan Ozhiganov 115 2 0 25 Jan 2022
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 72 18 0 19 Nov 2021
A study on the efficacy of model pre-training in developing neural text-to-speech system Guangyan Zhang Yichong Leng Daxin Tan Ying Qin Kaitao Song Xu Tan Sheng Zhao Tan Lee 58 2 0 08 Oct 2021
On-device neural speech synthesis Sivanand Achanta Albert Antony L. Golipour Jiangchuan Li T. Raitio ... Francesco Rossi Jennifer Shi Jaimin Upadhyay David Winarsky Hepeng Zhang 108 17 0 17 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta Éva Székely Jonas Beskow G. Henter 102 18 0 30 Aug 2021
Combining speakers of multiple languages to improve quality of neural voices Javier Latorre Charlotte Bailleul Tuuli H. Morrill Alistair Conkie Y. Stylianou 64 8 0 17 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech Abdelhamid Ezzerg Adam Gabry's Bartosz Putrycz Daniel Korzekwa Daniel Sáez-Trigueros David McHardy Kamil Pokora Jakub Lachowicz Jaime Lorenzo-Trueba V. Klimkov 130 6 0 13 Aug 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Ye Jia Michelle Tadmor Ramanovich Tal Remez Roi Pomerantz 105 73 0 19 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 67 0 0 29 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech Raahil Shah Kamil Pokora Abdelhamid Ezzerg V. Klimkov Goeric Huybrechts Bartosz Putrycz Daniel Korzekwa Thomas Merritt 64 26 0 24 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 94 25 0 20 Apr 2021
Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation Fengpeng Yue Yan Deng Lei He Tom Ko 70 8 0 08 Apr 2021
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition Hirofumi Inaguma Tatsuya Kawahara 125 14 0 28 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Jane Polak Scowcroft 95 22 0 12 Feb 2021
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis Xi Wang Huaiping Ming Lei He Frank Soong 43 5 0 17 Nov 2020
FeatherTTS: Robust and Efficient attention based Neural TTS Qiao Tian Zewang Zhang Chao-Jung Liu Heng Lu Linghui Chen Bin Wei P. He Shan Liu 69 4 0 02 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 76 103 0 22 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling Jonathan Shen Ye Jia Mike Chrzanowski Yu Zhang Isaac Elias Heiga Zen Yonghui Wu 106 112 0 08 Oct 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features T. Raitio Ramya Rasipuram D. Castellani 78 66 0 14 Sep 2020
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS Rui Liu Berrak Sisman F. Bao Guanglai Gao Haizhou Li 41 18 0 11 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 112 73 0 04 Aug 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer Mingjian Chen Xu Tan Yi Ren Jin Xu Hao Sun Sheng Zhao Tao Qin Tie-Yan Liu 65 110 0 08 Jun 2020
End-to-End Adversarial Text-to-Speech Jeff Donahue Sander Dieleman Mikolaj Binkowski Erich Elsen Karen Simonyan 85 187 0 05 Jun 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis Yusuke Yasuda Xin Wang Junichi Yamagishi AI4TS 76 31 0 20 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 91 67 0 18 May 2020
Teacher-Student Training for Robust Tacotron-based TTS Rui Liu Berrak Sisman Jingdong Li F. Bao Guanglai Gao Haizhou Li 109 38 0 07 Nov 2019
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment Yusuke Yasuda Xin Wang Junichi Yamagishi 25 2 0 28 Oct 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis Eric Battenberg RJ Skerry-Ryan Soroosh Mariooryad Daisy Stanton David Kao Matt Shannon Tom Bagby 106 114 0 23 Oct 2019