ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07217
  4. Cited By
Hierarchical Generative Modeling for Controllable Speech Synthesis
v1v2 (latest)

Hierarchical Generative Modeling for Controllable Speech Synthesis

16 October 2018
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Yuxuan Wang
Yuan Cao
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
    BDL
ArXiv (abs)PDFHTML

Papers citing "Hierarchical Generative Modeling for Controllable Speech Synthesis"

50 / 178 papers shown
Title
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
114
0
0
01 May 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
80
1
0
21 Mar 2025
Learning disentangled representations for instrument-based music similarity
Learning disentangled representations for instrument-based music similarity
Yuka Hashizume
Li Li
Atsushi Miyashita
Tomoki Toda
155
0
0
21 Mar 2025
Equivariant Blurring Diffusion for Hierarchical Molecular Conformer
  Generation
Equivariant Blurring Diffusion for Hierarchical Molecular Conformer Generation
Jiwoong Park
Yang Shen
DiffM
103
1
0
26 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in
  Multispeaker Text-to-Speech
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
57
2
0
17 Oct 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
153
0
0
13 Sep 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
78
1
0
20 Aug 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody
  Modeling
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
73
2
0
09 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
63
3
0
04 Jun 2024
Accent Conversion in Text-To-Speech Using Multi-Level VAE and
  Adversarial Training
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
67
2
0
03 Jun 2024
Learning Multidimensional Disentangled Representations of Instrumental
  Sounds for Musical Similarity Assessment
Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment
Yuka Hashizume
Li Li
Atsushi Miyashita
Tomoki Toda
53
3
0
10 Apr 2024
Natural language guidance of high-fidelity text-to-speech with synthetic
  annotations
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
100
49
0
02 Feb 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
113
42
0
14 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
133
94
0
25 Dec 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech
  Synthesis
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
94
1
0
20 Nov 2023
Controllable Generation of Artificial Speaker Embeddings through
  Discovery of Principal Directions
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
49
1
0
26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal
  point processes
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
95
0
0
23 Oct 2023
Prosody Analysis of Audiobooks
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
89
1
0
10 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling
  for Zero-Shot Voice Cloning
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
77
4
0
06 Oct 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
66
2
0
08 Sep 2023
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in
  Music Audio Signals
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals
Yiming Wu
CoGeDRL
113
0
0
06 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice
  Conversion by Multi-scale Style Modeling
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
114
4
0
03 Sep 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
81
3
0
28 Aug 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context
  Information for Expressive Speech Synthesis
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
87
7
0
29 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic
  Literature Review
The Ethical Implications of Generative Audio Models: A Systematic Literature Review
J. Barnett
86
32
0
07 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
132
306
0
23 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
The Age of Synthetic Realities: Challenges and Opportunities
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
115
30
0
09 Jun 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech
  Translation
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Kun Song
Yi Ren
Yinjiao Lei
Chunfeng Wang
Kun Wei
Linfu Xie
Xiang Yin
Zejun Ma
82
9
0
28 May 2023
Controllable Speaking Styles Using a Large Language Model
Controllable Speaking Styles Using a Large Language Model
A. Sigurgeirsson
Simon King
55
3
0
17 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
59
0
0
12 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
73
1
0
09 May 2023
Scientists' Perspectives on the Potential for Generative AI in their
  Fields
Scientists' Perspectives on the Potential for Generative AI in their Fields
Meredith Ringel Morris
AI4CE
71
43
0
04 Apr 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
118
7
0
06 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
94
29
0
03 Mar 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for
  Expressive Speech-to-Speech Translation
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
81
16
0
25 Jan 2023
Long-horizon video prediction using a dynamic latent hierarchy
Long-horizon video prediction using a dynamic latent hierarchy
Alexey Zakharov
Qinghai Guo
Zafeirios Fountas
77
4
0
29 Dec 2022
Text-to-speech synthesis based on latent variable conversion using
  diffusion probabilistic model and variational autoencoder
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Yusuke Yasuda
Tomoki Toda
DiffM
79
8
0
16 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and
  Speaker-wise Normalization in Speech Synthesis
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
71
6
0
13 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
66
7
0
29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech
  distributions
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
82
8
0
29 Nov 2022
Disentangled Representation Learning
Disentangled Representation Learning
Xin Eric Wang
Hong Chen
Siao Tang
Zihao Wu
Wenwu Zhu
DRL
174
87
0
21 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
83
6
0
07 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
60
1
0
01 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
50
0
0
28 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
130
20
0
27 Oct 2022
Controllable Accented Text-to-Speech Synthesis
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
79
6
0
22 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
Saeed Ghorbani
Ylva Ferstl
Daniel Holden
N. Troje
M. Carbonneau
123
83
0
15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech
The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi
Lloyd May
J. Berger
56
2
0
06 Sep 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical
  Approach For Accent Transfer Tasks
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
L. Finkelstein
Heiga Zen
Norman Casagrande
Chun-an Chan
Ye Jia
...
Jonathan Shen
V. Wan
Yu Zhang
Yonghui Wu
R. Clark
55
9
0
28 Aug 2022
Pathway to Future Symbiotic Creativity
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
66
0
0
18 Aug 2022
1234
Next