ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04222
  4. Cited By
Learning Latent Representations for Speech Generation and Transformation
v1v2 (latest)

Learning Latent Representations for Speech Generation and Transformation

13 April 2017
Wei-Ning Hsu
Yu Zhang
James R. Glass
    DRLBDLSSL
ArXiv (abs)PDFHTML

Papers citing "Learning Latent Representations for Speech Generation and Transformation"

50 / 76 papers shown
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
450
10
0
21 Apr 2025
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
261
2
0
02 Jul 2024
Interference Motion Removal for Doppler Radar Vital Sign Detection Using
  Variational Encoder-Decoder Neural Network
Interference Motion Removal for Doppler Radar Vital Sign Detection Using Variational Encoder-Decoder Neural Network
Mikolaj Czerkawski
C. Ilioudis
C. Clemente
C. Michie
I. Andonovic
Christos Tachtatzis
99
11
0
12 Apr 2024
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
192
3
0
08 Sep 2023
Deep networks for system identification: a Survey
Deep networks for system identification: a Survey
G. Pillonetto
Aleksandr Aravkin
Daniel Gedon
L. Ljung
Antônio H. Ribeiro
Thomas B. Schon
OOD
323
89
0
30 Jan 2023
An investigation of the reconstruction capacity of stacked convolutional
  autoencoders for log-mel-spectrograms
An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrogramsInternational Conference on Signal-Image Technology and Internet-Based Systems (SITIS), 2022
Anastasia Natsiou
Luca Longo
Seán O'Leary
81
0
0
18 Jan 2023
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method
  Using Variational Autoencoder and Adversarial Training
A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Yang Xiang
Jesper Lisby Højvang
M. Rasmussen
M. G. Christensen
DRL
180
7
0
16 Nov 2022
Privacy-Utility Balanced Voice De-Identification Using Adversarial
  Examples
Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples
Meng Chen
Liwang Lu
Jiadi Yu
Ying Chen
Zhongjie Ba
Feng Lin
Kui Ren
AAML
169
2
0
10 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
344
10
0
02 Nov 2022
Local Connection Reinforcement Learning Method for Efficient Control of
  Robotic Peg-in-Hole Assembly
Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly
Yuhang Gai
Jiwen Zhang
Dan Wu
Ken Chen
OffRL
163
1
0
24 Oct 2022
Learning Invariant Representation and Risk Minimized for Unsupervised
  Accent Domain Adaptation
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain AdaptationSpoken Language Technology Workshop (SLT), 2022
Chendong Zhao
Jianzong Wang
Xiaoyang Qu
Haoqian Wang
Jing Xiao
SSL
207
1
0
15 Oct 2022
Learning Multivariate CDFs and Copulas using Tensor Factorization
Learning Multivariate CDFs and Copulas using Tensor Factorization
Magda Amiridi
N. Sidiropoulos
173
2
0
13 Oct 2022
Gromov-Wasserstein Autoencoders
Gromov-Wasserstein AutoencodersInternational Conference on Learning Representations (ICLR), 2022
Nao Nakagawa
Ren Togo
Takahiro Ogawa
Miki Haseyama
GANDRL
241
16
0
15 Sep 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
647
442
0
21 May 2022
Improved far-field speech recognition using Joint Variational
  Autoencoder
Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar
S. Rath
Abhishek Pandey
DRL
113
0
0
24 Apr 2022
Learning and controlling the source-filter representation of speech with
  a variational autoencoder
Learning and controlling the source-filter representation of speech with a variational autoencoderSpeech Communication (Speech Commun.), 2022
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
SSLDRLBDL
285
14
0
14 Apr 2022
A Sparsity-promoting Dictionary Model for Variational Autoencoders
A Sparsity-promoting Dictionary Model for Variational AutoencodersInterspeech (Interspeech), 2022
M. Sadeghi
P. Magron
224
3
0
29 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and
  decoding lexical and sublexical semantic information into speech with no
  direct access to speech data
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech dataInterspeech (Interspeech), 2022
Gašper Beguš
Alan Zhou
SSL
247
6
0
22 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDLAI4TSSSL
230
13
0
01 Mar 2022
A Bayesian Permutation training deep representation learning method for
  speech enhancement with variational autoencoder
A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yang Xiang
Jesper Lisby Højvang
M. Rasmussen
M. G. Christensen
BDLDRL
151
7
0
24 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style TransferIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xiaochun An
Frank Soong
Lei Xie
286
22
0
24 Jan 2022
Towards Cross-Cultural Analysis using Music Information Dynamics
Towards Cross-Cultural Analysis using Music Information Dynamics
Shlomo Dubnov
Kevin Huang
Cheng-i Wang
116
1
0
24 Nov 2021
How Speech is Recognized to Be Emotional - A Study Based on Information
  Decomposition
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Haoran Sun
Lantian Li
Tianshi Zheng
Dong Wang
CVBM
99
0
0
24 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
1.1K
2,642
0
26 Oct 2021
Emphasis control for parallel neural TTS
Emphasis control for parallel neural TTS
Shreyas Seshadri
T. Raitio
D. Castellani
Jiangchuan Li
243
16
0
06 Oct 2021
Improving robustness of one-shot voice conversion with deep
  discriminative speaker encoder
Improving robustness of one-shot voice conversion with deep discriminative speaker encoderInterspeech (Interspeech), 2021
Hongqiang Du
Lei Xie
92
6
0
19 Jun 2021
Pathological voice adaptation with autoencoder-based voice conversion
Pathological voice adaptation with autoencoder-based voice conversion
M. Illa
B. Halpern
Rob van Son
Laureano Moro-Velazquez
O. Scharenborg
121
15
0
15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
A learned conditional prior for the VAE acoustic space of a TTS systemInterspeech (Interspeech), 2021
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
141
7
0
14 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
532
3,993
0
14 Jun 2021
A Benchmark of Dynamical Variational Autoencoders applied to Speech
  Spectrogram Modeling
A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram ModelingInterspeech (Interspeech), 2021
Xiaoyu Bie
Laurent Girin
Simon Leglaive
Thomas Hueber
Xavier Alameda-Pineda
219
12
0
11 Jun 2021
An Attribute-Aligned Strategy for Learning Speech Representation
An Attribute-Aligned Strategy for Learning Speech RepresentationInterspeech (Interspeech), 2021
Yu-Lin Huang
Bo-Hao Su
Y.-W. Peter Hong
Chi-Chun Lee
195
5
0
05 Jun 2021
Learning robust speech representation with an articulatory-regularized
  variational autoencoder
Learning robust speech representation with an articulatory-regularized variational autoencoderInterspeech (Interspeech), 2021
Marc-Antoine Georges
Laurent Girin
J. Schwartz
Thomas Hueber
DRL
110
4
0
07 Apr 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw AudioTransactions of the Association for Computational Linguistics (TACL), 2021
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
595
433
0
01 Feb 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based ApplicationsArtificial Intelligence Review (AIR), 2021
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Xiaoshi Zhong
OffRL
335
88
0
01 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Text-Free Image-to-Speech Synthesis Using Learned Segmental UnitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
167
74
0
31 Dec 2020
AudioViewer: Learning to Visualize Sounds
AudioViewer: Learning to Visualize SoundsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Chunjin Song
Yuchi Zhang
Willis Peng
Parmis Mohaghegh
Bastian Wandt
Helge Rhodin
269
3
0
22 Dec 2020
End-To-End Dilated Variational Autoencoder with Bottleneck
  Discriminative Loss for Sound Morphing -- A Preliminary Study
End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study
Matteo Lionello
Hendrik Purwins
147
0
0
19 Nov 2020
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition
  Challenge
The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge
Si-Ioi Ng
W. Liu
Zhiyuan Peng
Siyuan Feng
Hingpang Huang
O. Scharenborg
Tan Lee
3DV
126
8
0
12 Nov 2020
Deep generative factorization for speech signal
Deep generative factorization for speech signal
Haoran Sun
Lantian Li
Yunqi Cai
Yang Zhang
Tianshi Zheng
Dong Wang
84
0
0
27 Oct 2020
Dynamical Variational Autoencoders: A Comprehensive Review
Dynamical Variational Autoencoders: A Comprehensive Review
Laurent Girin
Simon Leglaive
Xiaoyu Bie
Julien Diard
Thomas Hueber
Xavier Alameda-Pineda
BDL
480
266
0
28 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep LearningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
435
388
0
09 Aug 2020
Nonlinear ISA with Auxiliary Variables for Learning Speech
  Representations
Nonlinear ISA with Auxiliary Variables for Learning Speech RepresentationsInterspeech (Interspeech), 2020
Amrith Rajagopal Setlur
Barnabás Póczós
A. Black
73
1
0
25 Jul 2020
Attribute-based Regularization of Latent Spaces for Variational
  Auto-Encoders
Attribute-based Regularization of Latent Spaces for Variational Auto-Encoders
Ashis Pati
Alexander Lerch
DRL
235
3
0
11 Apr 2020
Deep Autotuner: a Pitch Correcting Network for Singing Performances
Deep Autotuner: a Pitch Correcting Network for Singing PerformancesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Sanna Wager
George Tzanetakis
Cheng-i Wang
Minje Kim
109
12
0
12 Feb 2020
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded
  Speech
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded SpeechInternational Conference on Learning Representations (ICLR), 2019
David Harwath
Wei-Ning Hsu
James R. Glass
170
88
0
21 Nov 2019
Contextual Joint Factor Acoustic Embeddings
Contextual Joint Factor Acoustic EmbeddingsSpoken Language Technology Workshop (SLT), 2019
Yanpei Shi
Thomas Hain
104
3
0
16 Oct 2019
Improving Noise Robustness In Speaker Identification Using A Two-Stage
  Attention Model
Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model
Yanpei Shi
Qiang Huang
Thomas Hain
163
1
0
24 Sep 2019
Probabilistic Models with Deep Neural Networks
Probabilistic Models with Deep Neural Networks
A. Masegosa
Rafael Cabañas
H. Langseth
Thomas D. Nielsen
Antonio Salmerón
BDL
217
16
0
09 Aug 2019
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
Non-Parallel Voice Conversion with Cyclic Variational AutoencoderInterspeech (Interspeech), 2019
Patrick Lumban Tobing
Yi-Chiao Wu
Tomoki Hayashi
Kazuhiro Kobayashi
Tomoki Toda
150
72
0
24 Jul 2019
Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder
Statistical Voice Conversion with Quasi-Periodic WaveNet VocoderSpeech Synthesis Workshop (SSW), 2019
Yi-Chiao Wu
Patrick Lumban Tobing
Tomoki Hayashi
Kazuhiro Kobayashi
Tomoki Toda
201
2
0
21 Jul 2019
12
Next