ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.01700
  4. Cited By
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
v1v2 (latest)

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

4 September 2019
Chengzhu Yu
Heng Lu
Na Hu
Meng Yu
Chao Weng
Kun Xu
Peng Liu
Deyi Tuo
Shiyin Kang
Guangzhi Lei
Jane Polak Scowcroft
Dong Yu
    CVBM
ArXiv (abs)PDFHTML

Papers citing "DurIAN: Duration Informed Attention Network For Multimodal Synthesis"

44 / 44 papers shown
Title
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
199
3
0
12 May 2025
KunquDB: An Attempt for Speaker Verification in the Chinese Opera
  Scenario
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Huali Zhou
Yuke Lin
Dongxi Liu
Ming Li
57
0
0
20 Mar 2024
Prosody Analysis of Audiobooks
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
89
1
0
10 Oct 2023
Improving Mandarin Prosodic Structure Prediction with Multi-level
  Contextual Information
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
Jing Chen
Chang Song
Deyi Tuo
Xixin Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
68
1
0
31 Aug 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
169
48
0
21 Mar 2023
Simple and Effective Multi-sentence TTS with Expressive and Coherent
  Prosody
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Peter Makarov
Ammar Abbas
Mateusz Lajszczak
Arnaud Joly
S. Karlapati
Alexis Moinet
Thomas Drugman
Penny Karanasou
89
16
0
29 Jun 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating
  Inverse Short-Time Fourier Transform
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
89
62
0
04 Mar 2022
Deep Performer: Score-to-Audio Music Performance Synthesis
Deep Performer: Score-to-Audio Music Performance Synthesis
Hao-Wen Dong
Cong Zhou
Taylor Berg-Kirkpatrick
Julian McAuley
83
17
0
12 Feb 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for
  Singing Voice Synthesis
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
109
103
0
19 Jan 2022
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale
  Corpus
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
94
104
0
20 Dec 2021
Transformer-S2A: Robust and Efficient Speech-to-Animation
Transformer-S2A: Robust and Efficient Speech-to-Animation
Liyang Chen
Zhiyong Wu
Jun Ling
Runnan Li
Xu Tan
Sheng Zhao
96
19
0
18 Nov 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning
  using meta learning
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Songxiang Liu
Jane Polak Scowcroft
Dong Yu
56
10
0
14 Nov 2021
RAVE: A variational autoencoder for fast and high-quality neural audio
  synthesis
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis
Antoine Caillon
P. Esling
DRL
68
112
0
09 Nov 2021
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
Haohe Liu
Qiuqiang Kong
Qiao Tian
Yan Zhao
DeLiang Wang
Chuanzeng Huang
Yuxuan Wang
87
58
0
28 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
67
47
0
14 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with
  low-quality data for expressive speech synthesis
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Songxiang Liu
Shan Yang
Jane Polak Scowcroft
Dong Yu
AI4TS
62
10
0
08 Sep 2021
Integrated Speech and Gesture Synthesis
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
88
19
0
25 Aug 2021
Parallel and High-Fidelity Text-to-Lip Generation
Parallel and High-Fidelity Text-to-Lip Generation
Jinglin Liu
Zhiying Zhu
Yi Ren
Wencan Huang
Baoxing Huai
N. Yuan
Zhou Zhao
55
10
0
14 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference
  and Bidirectional Fusion
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
94
41
0
04 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
67
0
0
29 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Zhengxi Liu
Y. Qian
DRL
49
10
0
25 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
99
88
0
17 Jun 2021
DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech
  Enhancement
DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement
Shubo Lv
Yanxin Hu
Shimin Zhang
Lei Xie
61
94
0
16 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLMALM
94
25
0
20 Apr 2021
Attention Forcing for Machine Translation
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark Gales
60
7
0
02 Apr 2021
Improve GAN-based Neural Vocoder using Pointwise Relativistic
  LeastSquare GAN
Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN
Cong Wang
Yu Chen
Bin Wang
Yi Shi
146
1
0
26 Mar 2021
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform
  Generation in Multiple Domains
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Won Jang
D. Lim
Jaesam Yoon
60
34
0
19 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
79
74
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for
  Emotional Speech Synthesis
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
88
56
0
17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
116
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
99
101
0
06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural
  Text-to-Speech
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
66
19
0
04 Nov 2020
FeatherTTS: Robust and Efficient attention based Neural TTS
FeatherTTS: Robust and Efficient attention based Neural TTS
Qiao Tian
Zewang Zhang
Chao-Jung Liu
Heng Lu
Linghui Chen
Bin Wei
P. He
Shan Liu
69
4
0
02 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
76
103
0
22 Oct 2020
High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies
High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies
Linchao Bao
Xiangkai Lin
Yajing Chen
Haoxian Zhang
Sheng Wang
...
Haozhi Huang
Xinwei Jiang
Jue Wang
Dong Yu
Zhengyou Zhang
3DH
93
62
0
12 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis
  Including Unsupervised Duration Modeling
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
106
112
0
08 Oct 2020
Channel-wise Subband Input for Better Voice and Accompaniment Separation
  on High Resolution Music
Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
Haohe Liu
Lei Xie
Jian Wu
Geng Yang
83
31
0
12 Aug 2020
Peking Opera Synthesis via Duration Informed Attention Network
Peking Opera Synthesis via Duration Informed Attention Network
Yusong Wu
Shengchen Li
Chengzhu Yu
Heng Lu
Chao Weng
Liqiang Zhang
Dong Yu
48
11
0
07 Aug 2020
DurIAN-SC: Duration Informed Attention Network based Singing Voice
  Conversion System
DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
Liqiang Zhang
Chengzhu Yu
Heng Lu
Chao Weng
Chunlei Zhang
Yusong Wu
Xiang Xie
Zijin Li
Dong Yu
60
34
0
07 Aug 2020
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech
  without Explicit Alignment
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment
D. Lim
Won Jang
Gyeonghwan O
Heayoung Park
Bongwan Kim
Jaesam Yoon
71
37
0
15 May 2020
FeatherWave: An efficient high-fidelity neural vocoder with multi-band
  linear prediction
FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction
Qiao Tian
Zewang Zhang
Heng Lu
Linghui Chen
Shan Liu
69
22
0
12 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality
  Text-to-Speech
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
153
200
0
11 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep
  Transfer with Feedback Constraint
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
73
42
0
10 May 2020
ByteSing: A Chinese Singing Voice Synthesis System Using Duration
  Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
Yu Gu
Xiang Yin
Yonghui Rao
Yuan Wan
Benlai Tang
Yang Zhang
Jitong Chen
Yuxuan Wang
Zejun Ma
91
70
0
23 Apr 2020
1