Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.07217
Cited By
v1
v2 (latest)
Hierarchical Generative Modeling for Controllable Speech Synthesis
16 October 2018
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Yuxuan Wang
Yuan Cao
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Hierarchical Generative Modeling for Controllable Speech Synthesis"
50 / 178 papers shown
Title
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
114
0
0
01 May 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
80
1
0
21 Mar 2025
Learning disentangled representations for instrument-based music similarity
Yuka Hashizume
Li Li
Atsushi Miyashita
Tomoki Toda
155
0
0
21 Mar 2025
Equivariant Blurring Diffusion for Hierarchical Molecular Conformer Generation
Jiwoong Park
Yang Shen
DiffM
103
1
0
26 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
57
2
0
17 Oct 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
153
0
0
13 Sep 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
78
1
0
20 Aug 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
73
2
0
09 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
63
3
0
04 Jun 2024
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
67
2
0
03 Jun 2024
Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment
Yuka Hashizume
Li Li
Atsushi Miyashita
Tomoki Toda
53
3
0
10 Apr 2024
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
100
49
0
02 Feb 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
113
42
0
14 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
133
94
0
25 Dec 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
94
1
0
20 Nov 2023
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux
Pascal Tilli
Sarina Meyer
Ngoc Thang Vu
49
1
0
26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
95
0
0
23 Oct 2023
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
89
1
0
10 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
77
4
0
06 Oct 2023
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
66
2
0
08 Sep 2023
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals
Yiming Wu
CoGe
DRL
113
0
0
06 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
114
4
0
03 Sep 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
81
3
0
28 Aug 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
87
7
0
29 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature Review
J. Barnett
86
32
0
07 Jul 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
132
306
0
23 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
115
30
0
09 Jun 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Kun Song
Yi Ren
Yinjiao Lei
Chunfeng Wang
Kun Wei
Linfu Xie
Xiang Yin
Zejun Ma
82
9
0
28 May 2023
Controllable Speaking Styles Using a Large Language Model
A. Sigurgeirsson
Simon King
55
3
0
17 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
59
0
0
12 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
73
1
0
09 May 2023
Scientists' Perspectives on the Potential for Generative AI in their Fields
Meredith Ringel Morris
AI4CE
71
43
0
04 Apr 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
118
7
0
06 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
94
29
0
03 Mar 2023
A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation
Wen-Chin Huang
Benjamin Peloquin
Justine T. Kao
Changhan Wang
Hongyu Gong
Elizabeth Salesky
Yossi Adi
Ann Lee
Peng-Jen Chen
81
16
0
25 Jan 2023
Long-horizon video prediction using a dynamic latent hierarchy
Alexey Zakharov
Qinghai Guo
Zafeirios Fountas
77
4
0
29 Dec 2022
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Yusuke Yasuda
Tomoki Toda
DiffM
79
8
0
16 Dec 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
71
6
0
13 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
66
7
0
29 Nov 2022
Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
82
8
0
29 Nov 2022
Disentangled Representation Learning
Xin Eric Wang
Hong Chen
Siao Tang
Zihao Wu
Wenwu Zhu
DRL
174
87
0
21 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
83
6
0
07 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
60
1
0
01 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
50
0
0
28 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
130
20
0
27 Oct 2022
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
79
6
0
22 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
Saeed Ghorbani
Ylva Ferstl
Daniel Holden
N. Troje
M. Carbonneau
123
83
0
15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi
Lloyd May
J. Berger
56
2
0
06 Sep 2022
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
L. Finkelstein
Heiga Zen
Norman Casagrande
Chun-an Chan
Ye Jia
...
Jonathan Shen
V. Wan
Yu Zhang
Yonghui Wu
R. Clark
55
9
0
28 Aug 2022
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
66
0
0
18 Aug 2022
1
2
3
4
Next