Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06103
Cited By
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
11 June 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"
50 / 491 papers shown
Title
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
25
30
0
10 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
44
4
0
09 Sep 2024
USTC-KXDIGIT System Description for ASVspoof5 Challenge
Y. Chen
Haochen Wu
Nan Jiang
Xiang Xia
Qing Gu
...
Sian Fang
Yan Song
Wu Guo
Lin Liu
Minqiang Xu
36
1
0
03 Sep 2024
VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka
Li-Wei Chen
Hung-Shin Lee
Chen-Chi Chang
VLM
27
0
0
03 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
34
2
0
01 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
34
38
0
01 Sep 2024
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
Yan Rong
Li Liu
19
3
0
01 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
36
2
0
30 Aug 2024
WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding
Mohan Li
Cong-Thanh Do
Simon Keizer
Youmna Farag
Svetlana Stoyanchev
R. Doddipatla
35
2
0
29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
32
1
0
29 Aug 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
40
1
0
28 Aug 2024
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
Xuanru Zhou
Anshul Kashyap
Steve Li
Ayati Sharma
Brittany Morin
...
Z. Ezzes
Zachary Miller
M. G. Tempini
Jiachen Lian
Gopala Krishna Anumanchipalli
24
6
0
27 Aug 2024
Multi-faceted Sensory Substitution for Curb Alerting: A Pilot Investigation in Persons with Blindness and Low Vision
Ligao Ruan
Giles Hamilton-Fletcher
Mahya Beheshti
Todd E. Hudson
Maurizio Porfiri
JR Rizzo
38
0
0
26 Aug 2024
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard
Wonjune Kang
Margaret Hughes
Deb Roy
31
1
0
26 Aug 2024
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation
Shihao Chen
Yu Gu
Jianwei Cui
Jie Zhang
Rilin Chen
Lirong Dai
34
2
0
22 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
36
1
0
20 Aug 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
32
1
0
20 Aug 2024
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Xin Qi
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Shuchen Shi
...
Yuankun Xie
Yukun Liu
Guanjun Li
Xuefei Liu
Yongwei Li
27
1
0
20 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
23
1
0
19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
38
0
16 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
43
5
0
14 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
50
1
0
09 Aug 2024
MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving
B. Owoyele
Martin Schilling
Rohan Sawahn
Niklas Kaemer
Pavel Zherebenkov
Bhuvanesh Verma
Wim Pouw
Gerard de Melo
27
0
0
06 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
33
0
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
40
0
0
05 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
38
3
0
01 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Masato Mimura
Takatomo Kano
A. Ogawa
Marc Delcroix
19
2
0
01 Aug 2024
Generative Expressive Conversational Speech Synthesis
Rui Liu
Yifan Hu
Yi Ren
Xiang Yin
Haizhou Li
56
5
0
31 Jul 2024
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
Junyan Wu
Wei Lu
Xiangyang Luo
Rui Yang
Qian Wang
Xiaochun Cao
34
3
0
23 Jul 2024
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
35
4
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
34
4
0
21 Jul 2024
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
Srija Anand
Praveena Varadhan
Ashwin Sankar
Giri Raju
Mitesh M. Khapra
40
1
0
18 Jul 2024
Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems
Daniel Platnick
Bishoy Abdelnour
Eamon Earl
Rahul Kumar
Zahra Rezaei
Thomas Tsangaris
Faraj Lagum
26
0
0
18 Jul 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Ruibo Fu
Xin Qi
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xiaopeng Wang
Shuchen Shi
Yukun Liu
Xuefei Liu
Shuai Zhang
49
0
0
07 Jul 2024
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
Cong-Thanh Do
Shuhei Imai
R. Doddipatla
Thomas Hain
20
2
0
04 Jul 2024
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Tomoki Koriyama
39
0
0
03 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration
Philipp Allgeuer
Hassan Ali
Stefan Wermter
LM&Ro
31
9
0
29 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
36
5
0
21 Jun 2024
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Kentaro Mitsui
Koh Mitsuda
Toshiaki Wakatsuki
Yukiya Hono
Kei Sawada
36
4
0
18 Jun 2024
1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis
Sewade Ogun
A. Owodunni
Tobi Olatunji
Eniola Alese
Babatunde Oladimeji
Tejumade Afonja
Kayode Olaleye
Naome A. Etori
Tosin P. Adewumi
36
4
0
17 Jun 2024
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Taewoo Kim
Choongsang Cho
Young Han Lee
AI4TS
33
0
0
14 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
35
3
0
13 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
35
2
0
13 Jun 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Zhengyang Chen
Xuechen Liu
Erica Cooper
Junichi Yamagishi
Yanmin Qian
43
2
0
13 Jun 2024
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
Jihwan Lee
Aditya Kommineni
Tiantian Feng
Kleanthis Avramidis
Xuan Shi
Sudarsana Reddy Kadiri
Shrikanth Narayanan
31
0
0
12 Jun 2024
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
Rui Wang
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
23
2
0
12 Jun 2024
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar
Nirmesh Shah
Sai Akarsh
Pankaj Wasnik
R. Shah
32
1
0
12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
34
0
0
12 Jun 2024
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance
Semin Kim
Myeonghun Jeong
Hyeonseung Lee
Minchan Kim
Byoung Jin Choi
Nam Soo Kim
VLM
DiffM
45
1
0
10 Jun 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next