Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1704.04222
Cited By
v1
v2 (latest)
Learning Latent Representations for Speech Generation and Transformation
13 April 2017
Wei-Ning Hsu
Yu Zhang
James R. Glass
DRL
BDL
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Latent Representations for Speech Generation and Transformation"
50 / 76 papers shown
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
450
10
0
21 Apr 2025
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
261
2
0
02 Jul 2024
Interference Motion Removal for Doppler Radar Vital Sign Detection Using Variational Encoder-Decoder Neural Network
Mikolaj Czerkawski
C. Ilioudis
C. Clemente
C. Michie
I. Andonovic
Christos Tachtatzis
99
11
0
12 Apr 2024
Cross-Utterance Conditioned VAE for Speech Generation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
192
3
0
08 Sep 2023
Deep networks for system identification: a Survey
G. Pillonetto
Aleksandr Aravkin
Daniel Gedon
L. Ljung
Antônio H. Ribeiro
Thomas B. Schon
OOD
323
89
0
30 Jan 2023
An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms
International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), 2022
Anastasia Natsiou
Luca Longo
Seán O'Leary
81
0
0
18 Jan 2023
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Yang Xiang
Jesper Lisby Højvang
M. Rasmussen
M. G. Christensen
DRL
180
7
0
16 Nov 2022
Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples
Meng Chen
Liwang Lu
Jiadi Yu
Ying Chen
Zhongjie Ba
Feng Lin
Kui Ren
AAML
169
2
0
10 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Neural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
344
10
0
02 Nov 2022
Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly
Yuhang Gai
Jiwen Zhang
Dan Wu
Ken Chen
OffRL
163
1
0
24 Oct 2022
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation
Spoken Language Technology Workshop (SLT), 2022
Chendong Zhao
Jianzong Wang
Xiaoyang Qu
Haoqian Wang
Jing Xiao
SSL
207
1
0
15 Oct 2022
Learning Multivariate CDFs and Copulas using Tensor Factorization
Magda Amiridi
N. Sidiropoulos
173
2
0
13 Oct 2022
Gromov-Wasserstein Autoencoders
International Conference on Learning Representations (ICLR), 2022
Nao Nakagawa
Ren Togo
Takahiro Ogawa
Miki Haseyama
GAN
DRL
241
16
0
15 Sep 2022
Self-Supervised Speech Representation Learning: A Review
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
647
442
0
21 May 2022
Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar
S. Rath
Abhishek Pandey
DRL
113
0
0
24 Apr 2022
Learning and controlling the source-filter representation of speech with a variational autoencoder
Speech Communication (Speech Commun.), 2022
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
SSL
DRL
BDL
285
14
0
14 Apr 2022
A Sparsity-promoting Dictionary Model for Variational Autoencoders
Interspeech (Interspeech), 2022
M. Sadeghi
P. Magron
224
3
0
29 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Interspeech (Interspeech), 2022
Gašper Beguš
Alan Zhou
SSL
247
6
0
22 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
230
13
0
01 Mar 2022
A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yang Xiang
Jesper Lisby Højvang
M. Rasmussen
M. G. Christensen
BDL
DRL
151
7
0
24 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xiaochun An
Frank Soong
Lei Xie
286
22
0
24 Jan 2022
Towards Cross-Cultural Analysis using Music Information Dynamics
Shlomo Dubnov
Kevin Huang
Cheng-i Wang
116
1
0
24 Nov 2021
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Haoran Sun
Lantian Li
Tianshi Zheng
Dong Wang
CVBM
99
0
0
24 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
1.1K
2,642
0
26 Oct 2021
Emphasis control for parallel neural TTS
Shreyas Seshadri
T. Raitio
D. Castellani
Jiangchuan Li
243
16
0
06 Oct 2021
Improving robustness of one-shot voice conversion with deep discriminative speaker encoder
Interspeech (Interspeech), 2021
Hongqiang Du
Lei Xie
92
6
0
19 Jun 2021
Pathological voice adaptation with autoencoder-based voice conversion
M. Illa
B. Halpern
Rob van Son
Laureano Moro-Velazquez
O. Scharenborg
121
15
0
15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
Interspeech (Interspeech), 2021
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
141
7
0
14 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
532
3,993
0
14 Jun 2021
A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling
Interspeech (Interspeech), 2021
Xiaoyu Bie
Laurent Girin
Simon Leglaive
Thomas Hueber
Xavier Alameda-Pineda
219
12
0
11 Jun 2021
An Attribute-Aligned Strategy for Learning Speech Representation
Interspeech (Interspeech), 2021
Yu-Lin Huang
Bo-Hao Su
Y.-W. Peter Hong
Chi-Chun Lee
195
5
0
05 Jun 2021
Learning robust speech representation with an articulatory-regularized variational autoencoder
Interspeech (Interspeech), 2021
Marc-Antoine Georges
Laurent Girin
J. Schwartz
Thomas Hueber
DRL
110
4
0
07 Apr 2021
Generative Spoken Language Modeling from Raw Audio
Transactions of the Association for Computational Linguistics (TACL), 2021
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
595
433
0
01 Feb 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
Artificial Intelligence Review (AIR), 2021
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Xiaoshi Zhong
OffRL
335
88
0
01 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
167
74
0
31 Dec 2020
AudioViewer: Learning to Visualize Sounds
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Chunjin Song
Yuchi Zhang
Willis Peng
Parmis Mohaghegh
Bastian Wandt
Helge Rhodin
269
3
0
22 Dec 2020
End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study
Matteo Lionello
Hendrik Purwins
147
0
0
19 Nov 2020
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge
Si-Ioi Ng
W. Liu
Zhiyuan Peng
Siyuan Feng
Hingpang Huang
O. Scharenborg
Tan Lee
3DV
126
8
0
12 Nov 2020
Deep generative factorization for speech signal
Haoran Sun
Lantian Li
Yunqi Cai
Yang Zhang
Tianshi Zheng
Dong Wang
84
0
0
27 Oct 2020
Dynamical Variational Autoencoders: A Comprehensive Review
Laurent Girin
Simon Leglaive
Xiaoyu Bie
Julien Diard
Thomas Hueber
Xavier Alameda-Pineda
BDL
480
266
0
28 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
435
388
0
09 Aug 2020
Nonlinear ISA with Auxiliary Variables for Learning Speech Representations
Interspeech (Interspeech), 2020
Amrith Rajagopal Setlur
Barnabás Póczós
A. Black
73
1
0
25 Jul 2020
Attribute-based Regularization of Latent Spaces for Variational Auto-Encoders
Ashis Pati
Alexander Lerch
DRL
235
3
0
11 Apr 2020
Deep Autotuner: a Pitch Correcting Network for Singing Performances
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Sanna Wager
George Tzanetakis
Cheng-i Wang
Minje Kim
109
12
0
12 Feb 2020
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
International Conference on Learning Representations (ICLR), 2019
David Harwath
Wei-Ning Hsu
James R. Glass
170
88
0
21 Nov 2019
Contextual Joint Factor Acoustic Embeddings
Spoken Language Technology Workshop (SLT), 2019
Yanpei Shi
Thomas Hain
104
3
0
16 Oct 2019
Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model
Yanpei Shi
Qiang Huang
Thomas Hain
163
1
0
24 Sep 2019
Probabilistic Models with Deep Neural Networks
A. Masegosa
Rafael Cabañas
H. Langseth
Thomas D. Nielsen
Antonio Salmerón
BDL
217
16
0
09 Aug 2019
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
Interspeech (Interspeech), 2019
Patrick Lumban Tobing
Yi-Chiao Wu
Tomoki Hayashi
Kazuhiro Kobayashi
Tomoki Toda
150
72
0
24 Jul 2019
Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder
Speech Synthesis Workshop (SSW), 2019
Yi-Chiao Wu
Patrick Lumban Tobing
Tomoki Hayashi
Kazuhiro Kobayashi
Tomoki Toda
201
2
0
21 Jul 2019
1
2
Next