ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.07799
  4. Cited By
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech
  without Explicit Alignment
v1v2v3 (latest)

JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

15 May 2020
D. Lim
Won Jang
Gyeonghwan O
Heayoung Park
Bongwan Kim
Jaesam Yoon
ArXiv (abs)PDFHTML

Papers citing "JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment"

21 / 21 papers shown
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake DatasetSpoken Language Technology Workshop (SLT), 2024
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
305
23
0
13 Sep 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis
  Vocoders
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis VocodersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
177
2
0
13 Aug 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
351
9
0
31 May 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
269
1
0
25 Jan 2024
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTsInterspeech (Interspeech), 2023
Won Jang
D. Lim
Heayoung Park
253
1
0
18 May 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
515
76
0
21 Mar 2023
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
324
23
0
27 Oct 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to SpeechInterspeech (Interspeech), 2022
D. Lim
Sunghee Jung
Eesung Kim
433
71
0
31 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
184
4
0
21 Mar 2022
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS
  With Accurate Phoneme Duration Control
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration ControlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yunchao He
Jian Luan
Yujun Wang
357
3
0
09 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
436
93
0
30 Sep 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
470
446
0
29 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram
  Discriminators for High-Fidelity Waveform Generation
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform GenerationInterspeech (Interspeech), 2021
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
363
190
0
15 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLMALM
236
31
0
20 Apr 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural
  Architecture Search
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture SearchIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Jinzhu Li
Sheng Zhao
Enhong Chen
Tie-Yan Liu
164
72
0
08 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With
  Multi-guidance Attention And Multi-band Multi-time LPCNet
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
358
0
0
30 Jan 2021
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform
  Generation in Multiple Domains
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Won Jang
D. Lim
Jaesam Yoon
293
41
0
19 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
288
109
0
22 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
End-to-End Text-to-Speech using Latent Duration based on VQ-VAEIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
212
17
0
19 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis
  Including Unsupervised Duration Modeling
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
370
114
0
08 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
735
1,710
0
08 Jun 2020
1
Page 1 of 1