v1v2 (latest)

Learning Latent Representations for Speech Generation and Transformation

13 April 2017

Papers citing "Learning Latent Representations for Speech Generation and Transformation"

50 / 76 papers shown

OmniAudio: Generating Spatial Audio from 360-Degree Video

...

450

21 Apr 2025

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

Varun Krishna

Sriram Ganapathy

SSL

261

02 Jul 2024

Interference Motion Removal for Doppler Radar Vital Sign Detection Using Variational Encoder-Decoder Neural Network

12 Apr 2024

Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Guangzhi Sun

...

Wei Pan

192

08 Sep 2023

Deep networks for system identification: a Survey

323

30 Jan 2023

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrogramsInternational Conference on Signal-Image Technology and Internet-Based Systems (SITIS), 2022

Anastasia Natsiou

Luca Longo

Seán O'Leary

18 Jan 2023

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

180

16 Nov 2022

Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples

169

10 Nov 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022

Kaizhi Qian

344

02 Nov 2022

Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly

163

24 Oct 2022

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain AdaptationSpoken Language Technology Workshop (SLT), 2022

207

15 Oct 2022

Learning Multivariate CDFs and Copulas using Tensor Factorization

Magda Amiridi

N. Sidiropoulos

173

13 Oct 2022

Gromov-Wasserstein AutoencodersInternational Conference on Learning Representations (ICLR), 2022

Nao Nakagawa

Ren Togo

Takahiro Ogawa

Miki Haseyama

GAN DRL

241

15 Sep 2022

Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Abdel-rahman Mohamed

Hung-yi Lee

Lasse Borgholt

Jakob Drachmann Havtorn

...

647

442

21 May 2022

Improved far-field speech recognition using Joint Variational Autoencoder

113

24 Apr 2022

Learning and controlling the source-filter representation of speech with a variational autoencoderSpeech Communication (Speech Commun.), 2022

Samir Sadok

Simon Leglaive

Laurent Girin

Xavier Alameda-Pineda

Renaud Séguier

SSL DRL BDL

285

14 Apr 2022

A Sparsity-promoting Dictionary Model for Variational AutoencodersInterspeech (Interspeech), 2022

M. Sadeghi

P. Magron

224

29 Mar 2022

Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech dataInterspeech (Interspeech), 2022

Gašper Beguš

Alan Zhou

SSL

247

22 Mar 2022

A Brief Overview of Unsupervised Neural Speech Representation Learning

Lasse Borgholt

Jakob Drachmann Havtorn

230

01 Mar 2022

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

151

24 Jan 2022

Disentangling Style and Speaker Attributes for TTS Style TransferIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Xiaochun An

Frank Soong

Lei Xie

286

24 Jan 2022

Towards Cross-Cultural Analysis using Music Information Dynamics

Shlomo Dubnov

Kevin Huang

Cheng-i Wang

116

24 Nov 2021

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Dong Wang

24 Nov 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

...

Jian Wu

1.1K

2,642

26 Oct 2021

Emphasis control for parallel neural TTS

243

06 Oct 2021

Improving robustness of one-shot voice conversion with deep discriminative speaker encoderInterspeech (Interspeech), 2021

Hongqiang Du

Lei Xie

19 Jun 2021

Pathological voice adaptation with autoencoder-based voice conversion

M. Illa

B. Halpern

Rob van Son

Laureano Moro-Velazquez

O. Scharenborg

121

15 Jun 2021

A learned conditional prior for the VAE acoustic space of a TTS systemInterspeech (Interspeech), 2021

141

14 Jun 2021

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021

532

3,993

14 Jun 2021

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram ModelingInterspeech (Interspeech), 2021

Xavier Alameda-Pineda

219

11 Jun 2021

An Attribute-Aligned Strategy for Learning Speech RepresentationInterspeech (Interspeech), 2021

195

05 Jun 2021

Learning robust speech representation with an articulatory-regularized variational autoencoderInterspeech (Interspeech), 2021

110

07 Apr 2021

Generative Spoken Language Modeling from Raw AudioTransactions of the Association for Computational Linguistics (TACL), 2021

Yossi Adi

...

595

433

01 Feb 2021

A Survey on Deep Reinforcement Learning for Audio-Based ApplicationsArtificial Intelligence Review (AIR), 2021

335

01 Jan 2021

Text-Free Image-to-Speech Synthesis Using Learned Segmental UnitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

167

31 Dec 2020

AudioViewer: Learning to Visualize SoundsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

269

22 Dec 2020

End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study

Matteo Lionello

Hendrik Purwins

147

19 Nov 2020

The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge

126

12 Nov 2020

Deep generative factorization for speech signal

Yang Zhang

Dong Wang

27 Oct 2020

Dynamical Variational Autoencoders: A Comprehensive Review

Xavier Alameda-Pineda

BDL

480

266

28 Aug 2020

An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep LearningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020

Haizhou Li

435

388

09 Aug 2020

Nonlinear ISA with Auxiliary Variables for Learning Speech RepresentationsInterspeech (Interspeech), 2020

Amrith Rajagopal Setlur

Barnabás Póczós

A. Black

25 Jul 2020

Attribute-based Regularization of Latent Spaces for Variational Auto-Encoders

Ashis Pati

Alexander Lerch

DRL

235

11 Apr 2020

Deep Autotuner: a Pitch Correcting Network for Singing PerformancesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

109

12 Feb 2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded SpeechInternational Conference on Learning Representations (ICLR), 2019

David Harwath

Wei-Ning Hsu

James R. Glass

170

21 Nov 2019

Contextual Joint Factor Acoustic EmbeddingsSpoken Language Technology Workshop (SLT), 2019

Yanpei Shi

Thomas Hain

104

16 Oct 2019

Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model

Yanpei Shi

Qiang Huang

Thomas Hain

163

24 Sep 2019

Probabilistic Models with Deep Neural Networks

217

09 Aug 2019

Non-Parallel Voice Conversion with Cyclic Variational AutoencoderInterspeech (Interspeech), 2019

Patrick Lumban Tobing

150

24 Jul 2019

Statistical Voice Conversion with Quasi-Periodic WaveNet VocoderSpeech Synthesis Workshop (SSW), 2019

Yi-Chiao Wu

Patrick Lumban Tobing

Tomoki Hayashi

Kazuhiro Kobayashi

Tomoki Toda

201

21 Jul 2019