ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized
  Representation
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
81
0
0
14 Nov 2023
CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework
  for Zero-Shot Electroencephalography Signal Conversion
CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion
Anders Vestergaard Norskov
Alexander Neergaard Zahid
Morten Morup
65
3
0
13 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
100
30
0
10 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Muhammad Ali Farooq
Dan Bigioi
Rishabh Jain
Wang Yao
Mariam Yiwere
Peter Corcoran
86
0
0
08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
  Learning
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
53
0
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic
  Token Prediction
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
95
12
0
06 Nov 2023
Are cascade dialogue state tracking models speaking out of turn in
  spoken dialogues?
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?
Lucas Druart
Léo Jacqmin
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
67
0
0
03 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
69
0
0
27 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural
  Language Understanding Models to Speech Recognition Errors
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
56
6
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
104
36
0
25 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal
  point processes
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
95
0
0
23 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens
Acoustic BPE for Speech Generation with Discrete Tokens
Feiyu Shen
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
95
13
0
23 Oct 2023
An overview of text-to-speech systems and media applications
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
28
3
0
22 Oct 2023
Energy-Based Models For Speech Synthesis
Energy-Based Models For Speech Synthesis
Wanli Sun
Zehai Tu
Anton Ragni
DiffM
67
1
0
19 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training
  Data for Automatic Speech Recognition
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
59
3
0
12 Oct 2023
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for
  Phishing Prevention
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention
R. Valentim
Idilio Drago
Marco Mellia
Federico Cerutti
16
1
0
10 Oct 2023
Prosody Analysis of Audiobooks
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
89
1
0
10 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
119
16
0
08 Oct 2023
Unified speech and gesture synthesis using flow matching
Unified speech and gesture synthesis using flow matching
Shivam Mehta
Ruibo Tu
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
100
3
0
08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning
  Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
53
1
0
08 Oct 2023
Hate Speech Detection in Limited Data Contexts using Synthetic Data
  Generation
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation
Aman Khullar
Daniel K. Nkemelu
Cuong V. Nguyen
Michael L. Best
80
5
0
04 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform
  Generation
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
Roi Benita
Michael Elad
Joseph Keshet
DiffM
115
8
0
02 Oct 2023
Towards human-like spoken dialogue generation between AI agents from
  written dialogue
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
88
14
0
02 Oct 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
42
4
0
29 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio
  -- A Survey
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
76
0
0
26 Sep 2023
Deepfake audio as a data augmentation technique for training automatic
  speech to text transcription models
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Alexandre R. Ferreira
Cláudio E. C. Campelo
34
1
0
22 Sep 2023
DurIAN-E: Duration Informed Attention Network For Expressive
  Text-to-Speech Synthesis
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Yu Gu
Yianrao Bian
Guangzhi Lei
Chao Weng
Jane Polak Scowcroft
DiffM
55
2
0
22 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice
  Synthesizer Trained on Monolingual Singers
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
71
6
0
22 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with
  Multi-Scale Acoustic Prompts
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
90
2
0
21 Sep 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion
  Analysis
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
X. Wei
Jia Jia
Xiang Li
Zhiyong Wu
Ziyi Wang
69
1
0
21 Sep 2023
The Impact of Silence on Speech Anti-Spoofing
The Impact of Silence on Speech Anti-Spoofing
Yuxiang Zhang
Zhuo Li
Jingze Lu
Hua Hua
Wenchao Wang
Pengyuan Zhang
80
21
0
21 Sep 2023
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
Rui Liu
Bin Liu
Haizhou Li
53
3
0
21 Sep 2023
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
Belen Alastruey
Aleix Sant
Gerard I. Gállego
David Dale
Marta R. Costa-jussá
AuLLM
56
3
0
20 Sep 2023
Speak While You Think: Streaming Speech Synthesis During Text Generation
Speak While You Think: Streaming Speech Synthesis During Text Generation
Avihu Dekel
Slava Shechtman
Raul Fernandez
David Haws
Zvi Kons
R. Hoory
64
9
0
20 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
101
4
0
18 Sep 2023
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
Peter Ochieng
DiffM
56
0
0
18 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for
  Robust Polyglot Text-To-Speech
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
72
1
0
15 Sep 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown
  Multi-Class Ensemble of CNNs
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs
Md Awsafur Rahman
Bishmoy Paul
Najibul Haque Sarker
Zaber Ibn Abdul Hakim
S. Fattah
Mohammad Saquib
59
3
0
15 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech
  Using Natural Language Descriptions
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
95
25
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic
  and acoustic features
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
76
4
0
15 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
Direct Text to Speech Translation System using Acoustic Units
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
55
4
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
125
69
0
14 Sep 2023
MASTERKEY: Practical Backdoor Attack Against Speaker Verification
  Systems
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
Hanqing Guo
Xun Chen
Junfeng Guo
Li Xiao
Qiben Yan
84
13
0
13 Sep 2023
DCTTS: Discrete Diffusion Model with Contrastive Learning for
  Text-to-speech Generation
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
70
3
0
13 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in
  Speech Waveforms
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
87
8
0
13 Sep 2023
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram
Zhifeng Kong
Ming-Yu Liu
Ambrish Dantrey
Bryan Catanzaro
51
7
0
12 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
66
2
0
08 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial
  Network
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
71
11
0
06 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge
  2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
81
2
0
06 Sep 2023
Voice Morphing: Two Identities in One Voice
Voice Morphing: Two Identities in One Voice
Sushant Pani
Anurag Chowdhury
Morgan Sandler
Arun Ross
79
1
0
05 Sep 2023
Previous
123456...242526
Next