ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
Jiatong Shi
Yun Tang
Ann Lee
Hirofumi Inaguma
Changhan Wang
J. Pino
Shinji Watanabe
77
9
0
10 Apr 2023
ArmanTTS single-speaker Persian dataset
ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi
Vahid Saeedi
J. Peymanfard
Leila Alhabib
Hossein Zeinali
48
2
0
07 Apr 2023
AraSpot: Arabic Spoken Command Spotting
AraSpot: Arabic Spoken Command Spotting
Mahmoud Salhab
H. Harmanani
56
0
0
29 Mar 2023
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low
  Resource Languages
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages
Seong-Hyun Park
Myungseo Song
Bohyung Kim
Tae-Hyun Oh
40
1
0
28 Mar 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
52
9
0
24 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and
  Enhancement in Generative AI
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffMMedIm
132
73
0
23 Mar 2023
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive
  Structured Pruning
Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning
Sung-Feng Huang
Chia-Ping Chen
Zhi-Sheng Chen
Yu-Pao Tsai
Hung-yi Lee
83
3
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
186
170
0
21 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
169
48
0
21 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance
  body-conducted speech capture
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
61
7
0
17 Mar 2023
Evaluating gesture generation in a large-scale open challenge: The GENEA
  Challenge 2022
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022
Taras Kucherenko
Pieter Wolfert
Youngwoo Yoon
Carla Viegas
Teodor Nikolov
Mihail Tsakov
G. Henter
69
24
0
15 Mar 2023
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical
  Text Reports
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports
Hyunseung Chung
Jiho Kim
Joon-Myoung Kwon
K. Jeon
Min Sung Lee
Edward Choi
MedIm
80
16
0
09 Mar 2023
Do Prosody Transfer Models Transfer Prosody?
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
65
8
0
07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
118
7
0
06 Mar 2023
A Comparative Study of Self-Supervised Speech Representations in Read
  and Spontaneous TTS
A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
62
5
0
05 Mar 2023
A General Framework for Learning Procedural Audio Models of
  Environmental Sounds
A General Framework for Learning Procedural Audio Models of Environmental Sounds
Danzel Serrano
M. Cartwright
DiffMDRL
63
1
0
04 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
94
29
0
03 Mar 2023
Speaker-Aware Anti-Spoofing
Speaker-Aware Anti-Spoofing
Xuechen Liu
Md. Sahidullah
Kong Aik Lee
Tomi Kinnunen
81
3
0
02 Mar 2023
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank
  Inter- And Intra-Class Emotion Intensities
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
Shijun Wang
Jón Guðnason
Damian Borth
83
10
0
02 Mar 2023
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
  Benchmark for Speech Understanding
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Yingting Li
Ambuj Mehrish
Shuaijiang Zhao
Rishabh Bhardwaj
Amir Zadeh
Navonil Majumder
Rada Mihalcea
Soujanya Poria
AAML
66
18
0
02 Mar 2023
Leveraging Large Text Corpora for End-to-End Speech Summarization
Leveraging Large Text Corpora for End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
A. Ogawa
Marc Delcroix
Ryo Masumura
49
14
0
02 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
74
8
0
01 Mar 2023
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation
  Detection and Correction
DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction
R. Anantha
Kriti Bhasin
Daniela Aguilar
Prabal Vashisht
Becci Williamson
Srinivas Chappidi
63
0
0
01 Mar 2023
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Ajinkya Kulkarni
Atharva Kulkarni
Sara Shatnawi
Hanan Aldarmaki
37
9
0
28 Feb 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
81
9
0
28 Feb 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
54
7
0
28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
101
31
0
27 Feb 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice
  Conversion
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
Brendan O'Connor
S. Dixon
43
0
0
27 Feb 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using
  Variance Information via Normalizing Flow
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
67
6
0
27 Feb 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
86
3
0
24 Feb 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End
  Speech Recognition
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Leyuan Qu
C. Weber
S. Wermter
60
10
0
20 Feb 2023
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier
  Transform for Faster Conversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Houjian Guo
Chaoran Liu
C. Ishi
H. Ishiguro
BDL
100
13
0
16 Feb 2023
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech
  synthesis in Indian languages
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages
Sudhanshu Srivastava
Ishika Gupta
Anusha Prakash
Jom Kuriakose
H. Murthy
VLM
70
1
0
13 Feb 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World
  Spontaneous Speech
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
84
37
0
08 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal
  Supervision
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
101
206
0
07 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
104
32
0
01 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
109
18
0
30 Jan 2023
On granularity of prosodic representations in expressive text-to-speech
On granularity of prosodic representations in expressive text-to-speech
Mikolaj Babianski
Kamil Pokora
Raahil Shah
Rafał Sienkiewicz
Daniel Korzekwa
V. Klimkov
66
6
0
26 Jan 2023
On Batching Variable Size Inputs for Training End-to-End Speech
  Enhancement Systems
On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Philippe Gonzalez
T. S. Alstrøm
Tobias May
77
9
0
25 Jan 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
66
6
0
24 Jan 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a
  Case Study
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
49
11
0
22 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
64
15
0
21 Jan 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
64
24
0
20 Jan 2023
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Kinyugo Maina
85
7
0
16 Jan 2023
Modelling low-resource accents without accent-specific TTS frontend
Modelling low-resource accents without accent-specific TTS frontend
Georgi Tinchev
Marta Czarnowska
Kamil Deja
K. Yanagisawa
Marius Cotescu
80
4
0
11 Jan 2023
Dual Learning for Large Vocabulary On-Device ASR
Dual Learning for Large Vocabulary On-Device ASR
Cal Peyser
Ronny Huang
Tara N. Sainath
Rohit Prabhavalkar
M. Picheny
K. Cho
SSL
56
1
0
11 Jan 2023
Generative Emotional AI for Speech Emotion Recognition: The Case for
  Synthetic Emotional Speech Augmentation
Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation
Abdullah Shahid
S. Latif
Junaid Qadir
64
23
0
10 Jan 2023
Introducing Model Inversion Attacks on Automatic Speaker Recognition
Introducing Model Inversion Attacks on Automatic Speaker Recognition
Karla Pizzi
Franziska Boenisch
U. Sahin
Konstantin Böttinger
117
3
0
09 Jan 2023
SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
Heli Qi
Sashi Novitasari
Andros Tjandra
S. Sakti
Satoshi Nakamura
77
3
0
08 Jan 2023
Singing voice synthesis based on frame-level sequence-to-sequence models
  considering vocal timing deviation
Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation
Miku Nishihara
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
K. Tokuda
107
1
0
05 Jan 2023
Previous
123...789...242526
Next