ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.01950
  4. Cited By
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
4 March 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
    VLM
ArXiv (abs)PDFHTML

Papers citing "AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment"

32 / 32 papers shown
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
Shuhan Xia
Peipei Li
Xuannan Liu
Dongsen Zhang
Xinyu Guo
Zekun Li
AAML
297
1
0
26 Nov 2025
Eliminating stability hallucinations in llm-based tts models via attention guidance
Eliminating stability hallucinations in llm-based tts models via attention guidance
ShiMing Wang
Zhihao Du
Yang Xiang
Tianyu Zhao
Han Zhao
Xinyuan Wei
Xiangang Li
HanJie Guo
Zhenhua Ling
218
0
0
24 Sep 2025
Marco-Voice Technical Report
Marco-Voice Technical Report
Fengping Tian
Chenyang Lyu
Xuanfan Ni
Haoqin Sun
Qingjuan Li
...
Haijun Li
L. Wang
Zhao Xu
Weihua Luo
Kaifu Zhang
263
3
0
04 Aug 2025
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared
  Semantic Spaces
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Tianyu Zheng
Ge Zhang
Xingwei Qu
Ming Kuang
Stephen W. Huang
Zhaofeng He
OffRL
285
2
0
20 Feb 2024
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
251
4
0
08 Sep 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
  Speech Model
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech ModelIEEE International Joint Conference on Neural Network (IJCNN), 2023
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
340
1
0
23 Apr 2023
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Semi-Supervised Learning Based on Reference Model for Low-resource TTSInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
AI4TS
302
6
0
25 Oct 2022
Expressive, Variable, and Controllable Duration Modelling in TTS
Expressive, Variable, and Controllable Duration Modelling in TTSInterspeech (Interspeech), 2022
Ammar Abbas
Thomas Merritt
Alexis Moinet
S. Karlapati
Ewa Muszyñska
Simon Slangen
Elia Gatti
Thomas Drugman
207
12
0
28 Jun 2022
TDASS: Target Domain Adaptation Speech Synthesis Framework for
  Multi-speaker Low-Resource TTS
TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTSIEEE International Joint Conference on Neural Network (IJCNN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
226
14
0
24 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-SpeechAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yongqian Li
Cheng Yu
Guangzhi Sun
Hua Jiang
Fanglei Sun
Weiqin Zu
Ying Wen
Yang Yang
Jun Wang
200
7
0
09 May 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic
  alignment loss
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment lossSpoken Language Technology Workshop (SLT), 2022
Efthymios Georgiou
Kosmas Kritsis
Georgios Paraskevopoulos
Athanasios Katsamanis
Vassilis Katsouros
Alexandros Potamianos
380
4
0
28 Apr 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
182
4
0
21 Mar 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising
  Diffusion GANs
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Jane Polak Scowcroft
Dong Yu
DiffM
378
79
0
28 Jan 2022
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS
  With Accurate Phoneme Duration Control
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration ControlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yunchao He
Jian Luan
Yujun Wang
353
3
0
09 Oct 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Neural HMMs are all you need (for high-quality attention-free TTS)IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
455
22
0
30 Aug 2021
Federated Learning with Dynamic Transformer for Text to Speech
Federated Learning with Dynamic Transformer for Text to SpeechInterspeech (Interspeech), 2021
Zhenhou Hong
Jianzong Wang
Xiaoyang Qu
Jie Liu
Chendong Zhao
Jing Xiao
FedML
148
16
0
09 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Multi-Scale Spectrogram Modelling for Neural Text-to-SpeechSpeech Synthesis Workshop (SS), 2021
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
200
0
0
29 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
453
446
0
29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style
  Control
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
365
4
0
21 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
René Peinl
173
0
0
11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
384
1,242
0
11 Jun 2021
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Yi-Chen Chen
Po-Han Chi
Shu-Wen Yang
Kai-Wei Chang
Jheng-hao Lin
Sung-Feng Huang
Da-Rong Liu
Chi-Liang Liu
Cheng-Kuang Lee
Hung-yi Lee
MoE
358
19
0
07 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLMALM
234
31
0
20 Apr 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
Fast DCTTS: Efficient Deep Convolutional Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
M. Kang
Jihyun Lee
Simin Kim
Injung Kim
196
6
0
01 Apr 2021
MelGlow: Efficient Waveform Generative Network Based on
  Location-Variable Convolution
MelGlow: Efficient Waveform Generative Network Based on Location-Variable ConvolutionSpoken Language Technology Workshop (SLT), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
283
8
0
03 Dec 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
286
109
0
22 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
End-to-End Text-to-Speech using Latent Duration based on VQ-VAEIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
210
17
0
19 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis
  Including Unsupervised Duration Modeling
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
368
114
0
08 Oct 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
FastPitch: Parallel Text-to-speech with Pitch PredictionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Adrian Lañcucki
361
404
0
11 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
728
1,710
0
08 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
385
602
0
22 May 2020
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech
  without Explicit Alignment
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment
D. Lim
Won Jang
Gyeonghwan O
Heayoung Park
Bongwan Kim
Jaesam Yoon
248
39
0
15 May 2020
1
Page 1 of 1