ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
v1v2 (latest)

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXiv (abs)PDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 1,276 papers shown
Title
Making Social Platforms Accessible: Emotion-Aware Speech Generation with
  Integrated Text Analysis
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
122
0
0
24 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
T. Nguyen
Seymanur Akti
Ngoc-Quan Pham
A. Waibel
114
2
0
19 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in
  Multispeaker Text-to-Speech
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
57
2
0
17 Oct 2024
DurIAN-E 2: Duration Informed Attention Network with Adaptive
  Variational Autoencoder and Adversarial Learning for Expressive
  Text-to-Speech Synthesis
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
Yu Gu
Qiushi Zhu
Guangzhi Lei
Chao Weng
Jane Polak Scowcroft
DiffM
67
0
0
17 Oct 2024
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via
  Lightweight Value Optimization
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization
Xingqi Wang
Xiaoyuan Yi
Xing Xie
Jia Jia
60
1
0
16 Oct 2024
Spectral and Rhythm Features for Audio Classification with Deep
  Convolutional Neural Networks
Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks
Friedrich Wolf-Monheim
59
4
0
09 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
69
1
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
145
92
0
09 Oct 2024
Stage-Wise and Prior-Aware Neural Speech Phase Prediction
Stage-Wise and Prior-Aware Neural Speech Phase Prediction
Fei Liu
Yang Ai
Hui-Peng Du
Ye-Xin Lu
Rui Zheng
Zhen-Hua Ling
53
0
0
07 Oct 2024
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Ivan Rinaldi
Nicola Fanelli
Giovanna Castellano
G. Vessio
62
2
0
07 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
67
0
0
07 Oct 2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Haozhe Chen
Run Chen
Julia Hirschberg
82
3
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
206
26
0
01 Oct 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
  Any-to-One Voice Conversion
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
68
1
0
25 Sep 2024
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters
Lauri Juvela
Pablo Pérez Zarazaga
G. Henter
Zofia Malisz
56
0
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
129
5
0
23 Sep 2024
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
Kratika Bhagtani
Amit Kumar Singh Yadav
Paolo Bestagini
Edward J. Delp
DiffM
55
2
0
19 Sep 2024
A quest through interconnected datasets: lessons from highly-cited
  ICASSP papers
A quest through interconnected datasets: lessons from highly-cited ICASSP papers
Cynthia C. S. Liem
Doğa Taşcılar
Andrew M. Demetriou
54
0
0
19 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
82
8
0
18 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style
  Temporal Modeling in Text-to-Speech
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Xuefei Liu
Guanjun Li
DiffM
108
0
0
18 Sep 2024
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via
  Multi-task Learning
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Siqi Sun
Korin Richmond
136
0
0
15 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
66
2
0
13 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in
  Cross-Lingual Speech Anti-Spoofing
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
110
2
0
12 Sep 2024
Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
Jihyun Lee
Solee Im
Wonjun Lee
Gary Geunbae Lee
75
0
0
10 Sep 2024
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A
  High-Quality WaveGlow Vocoder Approach
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
Abdulhady Abas Abdullah
Sabat Salih Muhamad
Hadi Veisi
68
0
0
10 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for
  Robust Singing Voice Conversion
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
95
1
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
119
54
0
10 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
SongCreator: Lyrics-based Universal Song Generation
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
101
8
0
09 Sep 2024
Exploratory Visual Analysis for Increasing Data Readiness in Artificial
  Intelligence Projects
Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects
Mattias Tiger
Daniel Jakobsson
Anders Ynnerman
Fredrik Heintz
Daniel Jonsson
53
0
0
05 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
90
28
0
05 Sep 2024
A multilingual training strategy for low resource Text to Speech
A multilingual training strategy for low resource Text to Speech
Asma Amalas
Mounir Ghogho
Mohamed Chetouani
Rachid Oulad Haj Thami
76
2
0
02 Sep 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
94
1
0
28 Aug 2024
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained
  Controllable Text-to-Speech
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
Haowei Lou
Helen Paik
Wen Hu
Lina Yao
VLM
94
0
0
27 Aug 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
100
14
0
24 Aug 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing
  Tasks
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
93
3
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
79
3
0
20 Aug 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
76
1
0
20 Aug 2024
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a
  Low-Resource Language
Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language
Manjil Karki
Pratik Shakya
Sandesh Acharya
Ravi Pandit
Dinesh Gothe
122
0
0
19 Aug 2024
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Zhijun Jia
Huaying Xue
Xiulian Peng
Yan Lu
152
3
0
19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
96
50
0
16 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
110
6
0
14 Aug 2024
Content and Style Aware Audio-Driven Facial Animation
Content and Style Aware Audio-Driven Facial Animation
Qingju Liu
Hyeongwoo Kim
Gaurav Bharaj
DiffM
84
1
0
13 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
93
0
0
08 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level
  Coding
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
79
6
0
08 Aug 2024
Human Speech Perception in Noise: Can Large Language Models Paraphrase
  to Improve It?
Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?
Anupama Chingacham
Miaoran Zhang
Vera Demberg
Dietrich Klakow
71
0
0
07 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
136
0
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
65
0
0
05 Aug 2024
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms
  using Linguistic Features
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features
Peng Cheng
Yuwei Wang
Peng Huang
Zhongjie Ba
Xiaodong Lin
Feng Lin
Liwang Lu
Kui Ren
AAML
74
9
0
03 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
91
1
0
01 Aug 2024
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffMMQ
119
0
0
28 Jul 2024
Previous
12345...242526
Next