ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.03402
  4. Cited By
Effective Use of Variational Embedding Capacity in Expressive End-to-End
  Speech Synthesis
v1v2v3 (latest)

Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

8 June 2019
Eric Battenberg
Soroosh Mariooryad
Daisy Stanton
RJ Skerry-Ryan
Matt Shannon
David Kao
Tom Bagby
    BDL
ArXiv (abs)PDFHTML

Papers citing "Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis"

26 / 26 papers shown
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
MLAAD: The Multi-Language Audio Anti-Spoofing DatasetIEEE International Joint Conference on Neural Network (IJCNN), 2024
Nicolas Müller
Piotr Kawa
Wei Herng Choong
Edresson Casanova
Eren Golge
Thorsten Muller
P. Syga
Philip Sperl
Konstantin Böttinger
449
124
0
17 Jan 2024
Controllable Speaking Styles Using a Large Language Model
Controllable Speaking Styles Using a Large Language ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Sigurgeirsson
Simon King
241
9
0
17 May 2023
Do Prosody Transfer Models Transfer Prosody?
Do Prosody Transfer Models Transfer Prosody?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Sigurgeirsson
Simon King
DiffM
212
14
0
07 Mar 2023
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representationsSpeech Communication (Speech Commun.), 2022
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
194
10
0
29 Nov 2022
Into-TTS : Intonation Template Based Prosody Control System
Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee
Joun Yeop Lee
Heejin Choi
Seongkyu Mun
Sangjun Park
Jae-Sung Bae
Chanwoo Kim
316
5
0
04 Apr 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention
Artem Gorodetskii
Ivan Ozhiganov
337
4
0
25 Jan 2022
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End
  Speech Synthesis
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
208
12
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent
  Phoneme-level Prosody Control
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody ControlInternational Conference on Speech and Computer (SPECOM), 2021
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
204
4
0
19 Nov 2021
Speaker Generation
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
293
39
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
249
21
0
07 Nov 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
113
2
0
06 Oct 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive
  Speech Synthesis
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech SynthesisInterspeech (Interspeech), 2021
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
156
30
0
04 Aug 2021
On Prosody Modeling for ASR+TTS based Voice Conversion
On Prosody Modeling for ASR+TTS based Voice ConversionAutomatic Speech Recognition & Understanding (ASRU), 2021
Wen-Chin Huang
Tomoki Hayashi
Xinjian Li
Shinji Watanabe
Tomoki Toda
288
11
0
20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio
Learning De-identified Representations of Prosody from Raw AudioInternational Conference on Machine Learning (ICML), 2021
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
281
18
0
17 Jul 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
Fast DCTTS: Efficient Deep Convolutional Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
M. Kang
Jihyun Lee
Simin Kim
Injung Kim
205
6
0
01 Apr 2021
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
GAN Vocoder: Multi-Resolution Discriminator Is All You NeedInterspeech (Interspeech), 2021
J. You
Dalhyun Kim
Gyuhyeon Nam
Geumbyeol Hwang
Gyeongsu Chae
288
33
0
09 Mar 2021
FeatherTTS: Robust and Efficient attention based Neural TTS
FeatherTTS: Robust and Efficient attention based Neural TTSSpeech Synthesis Workshop (SSW), 2020
Qiao Tian
Zewang Zhang
Chao-Jung Liu
Heng Lu
Linghui Chen
Bin Wei
P. He
Shan Liu
182
4
0
02 Nov 2020
Multi-speaker Emotion Conversion via Latent Variable Regularization and
  a Chained Encoder-Decoder-Predictor Network
Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor NetworkInterspeech (Interspeech), 2020
Ravi Shankar
Hsi-Wei Hsieh
N. Charon
A. Venkataraman
253
11
0
25 Jul 2020
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network
  and an Adversarial Pair Discriminator
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair DiscriminatorInterspeech (Interspeech), 2020
Ravi Shankar
Jacob Sager
A. Venkataraman
GAN
300
19
0
25 Jul 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
402
66
0
18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
310
67
0
14 May 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
223
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody priorIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
270
96
0
06 Feb 2020
A unified sequence-to-sequence front-end model for Mandarin
  text-to-speech synthesis
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Junjie Pan
Xiang Yin
Zhiling Zhang
Shichao Liu
Yang Zhang
Zejun Ma
Yuxuan Wang
177
28
0
11 Nov 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech
  Synthesis
Location-Relative Attention Mechanisms For Robust Long-Form Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Eric Battenberg
RJ Skerry-Ryan
Soroosh Mariooryad
Daisy Stanton
David Kao
Matt Shannon
Tom Bagby
297
122
0
23 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Semi-Supervised Generative Modeling for Controllable Speech SynthesisInternational Conference on Learning Representations (ICLR), 2019
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
223
48
0
03 Oct 2019
1
Page 1 of 1