ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Z. Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 266 papers shown
Title
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants
Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants
Domna Bilika
Nikoletta Michopoulou
E. Alepis
Constantinos Patsakis
18
8
0
20 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
DiffM
VLM
31
85
0
31 Jan 2023
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Simbarashe Nyatsanga
Taras Kucherenko
Chaitanya Ahuja
G. Henter
Michael Neff
SLR
33
90
0
13 Jan 2023
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
15
1
0
11 Dec 2022
GreenEyes: An Air Quality Evaluating Model based on WaveNet
GreenEyes: An Air Quality Evaluating Model based on WaveNet
Kan Huang
Kai Zhang
Ming-de Liu
17
2
0
08 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
67
25
0
08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and
  Transfer Learning
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Ankur Debnath
Shridevi S Patil
Gangotri Nadiger
R. Ganesan
19
20
0
07 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice
  Source Features
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features
T. U. K. Reddy
Sahukari Chaitanya Varun
Kota Pranav Kumar Sankala Sreekanth
K. Murty
18
0
0
05 Dec 2022
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice
  Synthesis
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Qicong Xie
Jixun Yao
Linfu Xie
Dan Su
DiffM
13
8
0
03 Dec 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
32
2
0
26 Nov 2022
Efficient Incremental Text-to-Speech on GPUs
Efficient Incremental Text-to-Speech on GPUs
Muyang Du
Chuan Liu
Jiaxing Qi
Junjie Lai
16
1
0
25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural
  MIDI-to-Audio Synthesis Systems?
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Xuan Shi
Erica Cooper
Xin Wang
Junichi Yamagishi
Shrikanth Narayanan
25
1
0
25 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
36
18
0
17 Nov 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
  Disambiguation
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation
Chunyu Qiang
Peng Yang
Hao Che
Jinba Xiao
Xiaorui Wang
Zhongyuan Wang
13
3
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
18
46
0
17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with
  Multi-factor Constraints
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
25
5
0
16 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
32
12
0
13 Nov 2022
Online Phase Reconstruction via DNN-based Phase Differences Estimation
Online Phase Reconstruction via DNN-based Phase Differences Estimation
Yoshiki Masuyama
Kohei Yatabe
Kento Nagatomo
Yasuhiro Oikawa
3DV
8
7
0
12 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
  Multi-Speaker Text-to-Speech
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua-Hong Wu
35
0
0
07 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
21
6
0
07 Nov 2022
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems
  via Vowel Space
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee
Jaesung Bae
Seongkyu Mun
Heejin Choi
Joun Yeop Lee
Hoon-Young Cho
Chanwoo Kim
24
2
0
06 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental
  Frequency Scaling
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling
Jixun Yao
Qing Wang
Yi Lei
Pengcheng Guo
Linfu Xie
Namin Wang
Jie Liu
25
13
0
06 Nov 2022
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered
  Speech
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech
Xin Zhang
Iván Vallés-Pérez
A. Stolcke
Chengzhu Yu
J. Droppo
Olabanji Shonibare
Roberto Barra-Chicote
Venkatesh Ravichandran
25
6
0
04 Nov 2022
CCATMos: Convolutional Context-aware Transformer Network for
  Non-intrusive Speech Quality Assessment
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
Yuchen Liu
Li-Chia Yang
Alex Pawlicki
Marko Stamenovic
22
6
0
04 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
41
18
0
01 Nov 2022
Combining Automatic Speaker Verification and Prosody Analysis for
  Synthetic Speech Detection
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection
L. Attorresi
Davide Salvi
Clara Borrelli
Paolo Bestagini
Stefano Tubaro
18
21
0
31 Oct 2022
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
13
0
0
28 Oct 2022
FCTalker: Fine and Coarse Grained Context Modeling for Expressive
  Conversational Speech Synthesis
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Yifan Hu
Rui Liu
Guanglai Gao
Haizhou Li
72
7
0
27 Oct 2022
RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech
RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech
Kyumin Park
Keon Lee
Daeyoung Kim
Dongyeop Kang
21
0
0
26 Oct 2022
Visual onoma-to-wave: environmental sound synthesis from visual
  onomatopoeias and sound-source images
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka
Shinnosuke Takamichi
Keisuke Imoto
Yuki Okamoto
Kazuki Fujii
Hiroshi Saruwatari
DiffM
19
8
0
17 Oct 2022
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score
  Fusion
Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion
Yuxiang Zhang
Jingze Lu
Xingming Wang
Zhuo Li
Runqiu Xiao
Wenchao Wang
Ming Li
Pengyuan Zhang
43
5
0
13 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
24
14
0
12 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data
  for Zero-Shot Multi-Speaker Text-to-Speech
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
24
5
0
12 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep
  Learning Era
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
15
53
0
06 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic
  Audio Detection
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Daniele Mari
Federica Latora
Simone Milani
11
11
0
06 Oct 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and
  Accompanied Baseline
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Yifan Hu
Pengkai Yin
Rui Liu
F. Bao
Guanglai Gao
11
5
0
22 Sep 2022
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in
  Paragraph-based TTS
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Liumeng Xue
Frank Soong
Shaofei Zhang
Linfu Xie
19
23
0
14 Sep 2022
Visualising Model Training via Vowel Space for Text-To-Speech Systems
Visualising Model Training via Vowel Space for Text-To-Speech Systems
Binu Abeysinghe
Jesin James
C. Watson
Felix Marattukalam
18
2
0
21 Aug 2022
Fully Automated End-to-End Fake Audio Detection
Fully Automated End-to-End Fake Audio Detection
Chenglong Wang
Jiangyan Yi
J. Tao
Haiyang Sun
Xun Chen
Zhengkun Tian
Haoxin Ma
Cunhang Fan
Ruibo Fu
24
28
0
20 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Xiang Li
Changhe Song
X. Wei
Zhiyong Wu
Jia Jia
H. Meng
14
4
0
10 Aug 2022
AdaCat: Adaptive Categorical Discretization for Autoregressive Models
AdaCat: Adaptive Categorical Discretization for Autoregressive Models
Qiyang Li
Ajay Jain
Pieter Abbeel
OffRL
45
4
0
03 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech
  Synthesis
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
19
4
0
03 Aug 2022
End-to-End Spoken Language Understanding: Performance analyses of a
  voice command task in a low resource setting
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Thierry Desot
François Portet
Michel Vacher
25
12
0
17 Jul 2022
Data Augmentation for Low-Resource Quechua ASR Improvement
Data Augmentation for Low-Resource Quechua ASR Improvement
Rodolfo Zevallos
Núria Bel
Guillermo Cámbara
Mireia Farrús
Jordi Luque
VLM
SyDa
14
6
0
14 Jul 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
193
0
13 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and
  Any-to-any Voice Conversion
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Dan Su
DiffM
48
12
0
05 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Kyle Kastner
Aaron Courville
27
0
0
30 Jun 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis
  using ranking support vector machine with variational autoencoder
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song
Ryuichi Yamamoto
Ohsung Kwon
Chan Song
Min-Jae Hwang
Suhyeon Oh
Hyun-Wook Yoon
Jin-Seob Kim
Jae-Min Kim
35
7
0
30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for
  Speech Synthesis based on Disentanglement between Prosody and Timbre
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
W. Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
48
26
0
29 Jun 2022
Previous
123456
Next