ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11439
  4. Cited By
Parallel Tacotron: Non-Autoregressive and Controllable TTS

Parallel Tacotron: Non-Autoregressive and Controllable TTS

22 October 2020
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
    DRL
ArXiv (abs)PDFHTML

Papers citing "Parallel Tacotron: Non-Autoregressive and Controllable TTS"

50 / 64 papers shown
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
Kangxiang Xia
Xinfa Zhu
Lei Xie
WenJie Tian
W. Li
Lei Xie
VLM
485
8
0
22 Dec 2024
Zero-shot Cross-lingual Voice Transfer for TTS
Zero-shot Cross-lingual Voice Transfer for TTS
Fadi Biadsy
Youzheng Chen
Isaac Elias
Kyle Kastner
Gary Wang
Andrew Rosenberg
Bhuvana Ramabhadran
235
2
0
20 Sep 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech ProcessingIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
Jianhua Tao
AI4TS
436
8
0
11 Aug 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient
  Zero-Shot Text to Speech Synthesizers
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
153
6
0
22 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
391
2
0
10 Jun 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
565
325
0
05 Mar 2024
Extending Multilingual Speech Synthesis to 100+ Languages without
  Transcribed Data
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
383
20
0
29 Feb 2024
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and
  Phoneme Duration for Multi-Speaker Speech Synthesis
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
Kenichi Fujita
Atsushi Ando
Yusuke Ijima
102
4
0
11 Feb 2024
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to SpeechAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
361
48
0
02 Nov 2023
Prosody Analysis of Audiobooks
Prosody Analysis of AudiobooksInternational Computer Science Conference (ICSC), 2023
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
368
2
0
10 Oct 2023
High-Fidelity Speech Synthesis with Minimal Supervision: All Using
  Diffusion Models
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chunyu Qiang
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang
DiffM
310
7
0
27 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent VideosAAAI Conference on Artificial Intelligence (AAAI), 2023
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
371
11
0
29 Aug 2023
Using Text Injection to Improve Recognition of Personal Identifiers in
  Speech
Using Text Injection to Improve Recognition of Personal Identifiers in SpeechInterspeech (Interspeech), 2023
Yochai Blau
Rohan Agrawal
Lior Madmony
Gary Wang
Andrew Rosenberg
Zhehuai Chen
Zorik Gekhman
Genady Beryozkin
Parisa Haghani
Bhuvana Ramabhadran
165
3
0
14 Aug 2023
Comparing normalizing flows and diffusion models for prosody and
  acoustic modelling in text-to-speech
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speechInterspeech (Interspeech), 2023
Guangyan Zhang
Thomas Merritt
M. Ribeiro
Biel Tura Vecino
K. Yanagisawa
...
Ammar Abbas
Piotr Bilinski
Roberto Barra-Chicote
Daniel Korzekwa
Jaime Lorenzo-Trueba
DiffM
256
3
0
31 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic CodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
Jianwu Dang
DiffM
281
18
0
28 Jul 2023
GenerTTS: Pronunciation Disentanglement for Timbre and Style
  Generalization in Cross-Lingual Text-to-Speech
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-SpeechInterspeech (Interspeech), 2023
Yahuan Cong
Haoyu Zhang
Hao-Ping Lin
Shichao Liu
Chunfeng Wang
Yi Ren
Xiang Yin
Zejun Ma
151
1
0
27 Jun 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech CorpusInterspeech (Interspeech), 2023
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
274
161
0
30 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive
  Language-Audio Pre-training
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
183
29
0
18 May 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised
  Style Extractor and Hierarchical Modeling in Speech Synthesis
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chunyu Qiang
Peng Yang
Hao Che
Ying Zhang
Xiaorui Wang
Zhong-ming Wang
253
12
0
14 Mar 2023
An End-to-End Neural Network for Image-to-Audio Transformation
An End-to-End Neural Network for Image-to-Audio TransformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Liu Chen
Michael Deisher
Munir Georges
193
5
0
10 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with
  Talking Face Video
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face VideoAAAI Conference on Artificial Intelligence (AAAI), 2023
Minsu Kim
Chae Won Kim
Y. Ro
CVBMDiffM
204
4
0
27 Feb 2023
On granularity of prosodic representations in expressive text-to-speech
On granularity of prosodic representations in expressive text-to-speechSpoken Language Technology Workshop (SLT), 2023
Mikolaj Babianski
Kamil Pokora
Raahil Shah
Rafał Sienkiewicz
Daniel Korzekwa
V. Klimkov
188
10
0
26 Jan 2023
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention
  Mechanism
Singing Voice Synthesis Based on a Musical Note Position-Aware Attention MechanismIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
244
2
0
28 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and
  Speaker-wise Normalization in Speech Synthesis
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech SynthesisInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
231
9
0
13 Dec 2022
Direct Speech-to-speech Translation without Textual Annotation using
  Bottleneck Features
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Junhui Zhang
Junjie Pan
Xiang Yin
Zejun Ma
170
1
0
12 Dec 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone
  Disambiguation
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone DisambiguationAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Chunyu Qiang
Peng Yang
Hao Che
Jinba Xiao
Xiaorui Wang
Zhongyuan Wang
162
6
0
17 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
262
1
0
01 Nov 2022
The Importance of Accurate Alignments in End-to-End Speech Synthesis
The Importance of Accurate Alignments in End-to-End Speech SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2022
Anusha Prakash
H. Murthy
147
0
0
31 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for
  End-to-end Emotional Speech Synthesis
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
305
19
0
28 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
327
25
0
27 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero
  supervised speech ASR
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASRSpoken Language Technology Workshop (SLT), 2022
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
290
17
0
18 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationSpoken Language Technology Workshop (SLT), 2022
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
255
36
0
03 Oct 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
213
12
0
13 Jul 2022
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning
  to Separate
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to SeparateInterspeech (Interspeech), 2022
Nabarun Goswami
Tatsuya Harada
208
5
0
13 Jul 2022
DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational
  Lensing Data
DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data
Nicolò Oreste Pinciroli Vago
Piero Fraternali
404
5
0
02 May 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and
  Natural Non-Autoregressive Text-to-Speech
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-SpeechInterspeech (Interspeech), 2022
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
349
6
0
08 Apr 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
MAESTRO: Matched Speech Text Representations through Modality MatchingInterspeech (Interspeech), 2022
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
301
120
0
07 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End
  Lightweight Text-to-Speech
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-SpeechInterspeech (Interspeech), 2022
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
225
0
0
05 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different
  Configurations for Speech Synthesis
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
335
0
0
01 Apr 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
184
4
0
21 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based
  Non-Autoregressive TTS
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTSInterspeech (Interspeech), 2022
Haohan Guo
Hui Lu
Xixin Wu
Helen Meng
928
10
0
02 Mar 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLMHAI
317
175
0
18 Feb 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising
  Diffusion GANs
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Jane Polak Scowcroft
Dong Yu
DiffM
390
79
0
28 Jan 2022
Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models
Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yi Liu
Zhiyuan Guo
Chao-Hong Tan
Ya-Jun Hu
Yuan Jiang
Zhenhua Ling
220
12
0
26 Jan 2022
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-SpeechComputer Vision and Pattern Recognition (CVPR), 2021
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
252
21
0
19 Nov 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
A. P. D. Martos
Albert Sanchis
Alfons Juan-Císcar
299
6
0
29 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard
  Challenge 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
Pong C. Yuen
Jinzhu Li
Lei He
Sheng Zhao
306
61
0
25 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS
  With Accurate Phoneme Duration Control
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration ControlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yunchao He
Jian Luan
Yujun Wang
362
3
0
09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer
  Normalization and Semi-Supervised Training in Text-To-Speech
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
197
19
0
08 Oct 2021
A study on the efficacy of model pre-training in developing neural
  text-to-speech system
A study on the efficacy of model pre-training in developing neural text-to-speech systemIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Guangyan Zhang
Yichong Leng
Daxin Tan
Ying Qin
Kaitao Song
Xu Tan
Sheng Zhao
Tan Lee
161
2
0
08 Oct 2021
12
Next
Page 1 of 2