Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.07799
Cited By
v1
v2
v3 (latest)
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment
15 May 2020
D. Lim
Won Jang
Gyeonghwan O
Heayoung Park
Bongwan Kim
Jaesam Yoon
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment"
21 / 21 papers shown
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Spoken Language Technology Workshop (SLT), 2024
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
305
23
0
13 Sep 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
IEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
177
2
0
13 Aug 2024
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
351
9
0
31 May 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
269
1
0
25 Jan 2024
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Interspeech (Interspeech), 2023
Won Jang
D. Lim
Heayoung Park
253
1
0
18 May 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
515
76
0
21 Mar 2023
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
324
23
0
27 Oct 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Interspeech (Interspeech), 2022
D. Lim
Sunghee Jung
Eesung Kim
433
71
0
31 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
184
4
0
21 Mar 2022
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yunchao He
Jian Luan
Yujun Wang
357
3
0
09 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
436
93
0
30 Sep 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
470
446
0
29 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Interspeech (Interspeech), 2021
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
363
190
0
15 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
236
31
0
20 Apr 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Jinzhu Li
Sheng Zhao
Enhong Chen
Tie-Yan Liu
164
72
0
08 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
358
0
0
30 Jan 2021
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Won Jang
D. Lim
Jaesam Yoon
293
41
0
19 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
288
109
0
22 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
212
17
0
19 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
370
114
0
08 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
735
1,710
0
08 Jun 2020
1
Page 1 of 1