Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.08969
Cited By
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
24 October 2017
Hideyuki Tachibana
Katsuya Uenoyama
Shunsuke Aihara
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention"
42 / 42 papers shown
Title
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
52
11
0
25 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
36
9
0
15 Jun 2024
A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data
Saptarshi Chakraborty
Peter L. Bartlett
44
1
0
24 Feb 2024
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
25
4
0
08 May 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
17
0
30 Jan 2023
Deliberation Networks and How to Train Them
Qingyun Dou
Mark J. F. Gales
19
0
0
06 Nov 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
19
4
0
03 Aug 2022
Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss
Riccardo Franceschini
Enrico Fini
Cigdem Beyan
Alessandro Conti
F. Arrigoni
Elisa Ricci
SSL
OffRL
34
16
0
23 Jul 2022
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Tae-Woo Kim
Minguk Kang
Gyeong-Hoon Lee
AAML
14
6
0
23 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
38
211
0
09 May 2022
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch
Hanbin Bae
Young-Sun Joo
19
2
0
12 Apr 2022
Karaoker: Alignment-free singing voice synthesis with speech training data
Panos Kakoulidis
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
June Sig Sung
Gunu Jho
Pirros Tsiakoulis
Aimilios Chalamandaris
8
3
0
08 Apr 2022
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
47
60
0
15 Oct 2021
Improving Time Series Classification Algorithms Using Octave-Convolutional Layers
Samuel Harford
Fazle Karim
H. Darabi
AI4TS
22
1
0
28 Sep 2021
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification
Bidisha Sharma
Maulik C. Madhavi
Xuehao Zhou
Haizhou Li
15
2
0
28 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
19
18
0
30 Aug 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
23
3
0
21 Jun 2021
Speaker disentanglement in video-to-speech conversion
Dan Oneaţă
Adriana Stan
H. Cucu
16
9
0
20 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
21
24
0
20 Apr 2021
A study of latent monotonic attention variants
Albert Zeyer
Ralf Schluter
Hermann Ney
18
5
0
30 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
34
22
0
12 Feb 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
17
8
0
03 Jan 2021
Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition
Jean-Benoit Delbrouck
Noé Tits
Stéphane Dupont
22
20
0
05 Oct 2020
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
Ravichander Vipperla
Sangjun Park
Kihyun Choo
Samin S. Ishtiaq
Kyoungbo Min
S. Bhattacharya
Abhinav Mehrotra
Alberto Gil C. P. Ramos
Nicholas D. Lane
16
26
0
11 Aug 2020
SpeedySpeech: Efficient Neural Speech Synthesis
Jan Vainer
Ondrej Dusek
11
42
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
27
316
0
09 Aug 2020
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Yi Ren
Xu Tan
Tao Qin
Jian Luan
Zhou Zhao
Tie-Yan Liu
28
73
0
09 Jul 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
19
109
0
08 Jun 2020
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
T. Toda
ViT
22
30
0
18 May 2020
Towards Automatic Face-to-Face Translation
Prajwal K R
Rudrabha Mukhopadhyay
Jerin Philip
Abhishek Jha
Vinay P. Namboodiri
C. V. Jawahar
CVBM
31
172
0
01 Mar 2020
Deep Long Audio Inpainting
Ya-Liang Chang
Kuan-Ying Lee
Po-Yu Wu
Hung-yi Lee
Winston H. Hsu
30
33
0
15 Nov 2019
Teacher-Student Training for Robust Tacotron-based TTS
Rui Liu
Berrak Sisman
Jingdong Li
F. Bao
Guanglai Gao
Haizhou Li
16
38
0
07 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
T. Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
16
201
0
24 Oct 2019
Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer
Merlijn Blaauw
J. Bonada
19
55
0
22 Oct 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
H. Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
23
716
0
13 Sep 2019
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
17
8
0
30 Aug 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
12
10
0
05 Jul 2019
FPETS : Fully Parallel End-to-End Text-to-Speech System
Dabiao Ma
Zhiba Su
Wenxuan Wang
Yuhao Lu
19
6
0
12 Dec 2018
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Jing-Xuan Zhang
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Lirong Dai
11
129
0
16 Oct 2018
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
255
13,364
0
25 Aug 2014
1