Unsupervised Speech Decomposition via Triple Information Bottleneck

23 April 2020

Kaizhi Qian

Papers citing "Unsupervised Speech Decomposition via Triple Information Bottleneck"

43 / 43 papers shown

Title
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning Runwu Shi Katsutoshi Itoyama K. Nakadai SSL DRL 39 1 0 31 Dec 2024
Voice Conversion-based Privacy through Adversarial Information Hiding J. Webber O. Watts G. Henter Jennifer Williams Simon King 45 0 0 23 Sep 2024
Prosody-Driven Privacy-Preserving Dementia Detection Dominika Woszczyk Ranya Aloufi Soteris Demetriou 34 2 0 03 Jul 2024
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition Wenhan Yao Jiangkun Yang yongqiang He Jia Liu Weiping Wen 44 1 0 16 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy Linhan Ma Xinfa Zhu Yuanjun Lv Zhichao Wang Ziqian Wang Wendi He Hongbin Zhou Lei Xie 42 2 0 14 Jun 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion Pengcheng Li Jianzong Wang Xulong Zhang Yong Zhang Jing Xiao Ning Cheng DRL 33 1 0 02 May 2024
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention Junjie Li Yiwei Guo Xie Chen Kai Yu 38 13 0 14 Dec 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation Haram Choi Sang-Hoon Lee Seong-Whan Lee DiffM 21 24 0 08 Nov 2023
VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice F. Bous Axel Roebel 16 0 0 05 Oct 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment Ruiqi Li Rongjie Huang Lichao Zhang Jinglin Liu Zhou Zhao 27 4 0 08 May 2023
Label Information Bottleneck for Label Enhancement Qinghai Zheng Jihua Zhu Haoyu Tang 31 6 0 13 Mar 2023
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units Gallil Maimon Yossi Adi 21 13 0 19 Dec 2022
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units Li-Wei Chen Shinji Watanabe Alexander I. Rudnicky 22 6 0 12 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS Dongchao Yang Songxiang Liu Jianwei Yu Helin Wang Chao Weng Yuexian Zou DiffM VLM 33 18 0 04 Nov 2022
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao 19 1 0 25 Oct 2022
DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion Chihiro Watanabe Hirokazu Kameoka DRL 24 0 0 20 Oct 2022
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Mei-Shuo Chen Z. Duan 22 10 0 23 Sep 2022
Non-Parallel Voice Conversion for ASR Augmentation Gary Wang Andrew Rosenberg Bhuvana Ramabhadran Fadi Biadsy Yinghui Huang Jesse Emond P. M. Mengibar 13 2 0 15 Sep 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin W. Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 48 26 0 29 Jun 2022
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Kaizhi Qian Yang Zhang Heting Gao Junrui Ni Cheng-I Jeff Lai David D. Cox M. Hasegawa-Johnson Shiyu Chang DRL 24 110 0 20 Apr 2022
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment Tobias Weise P. Klumpp Kubilay Can Demir Andreas K. Maier E. Noeth B.J. Heismann Maria Schuster S. Yang 11 3 0 08 Apr 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion Weida Liang Lantian Li Wenqiang Du Dong Wang 43 0 0 08 Apr 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion Xintao Zhao Feng Liu Changhe Song Zhiyong Wu Shiyin Kang Deyi Tuo H. Meng 16 20 0 24 Mar 2022
CGIBNet: Bandwidth-constrained Communication with Graph Information Bottleneck in Multi-Agent Reinforcement Learning Qi Tian Kun Kuang Baoxiang Wang Furui Liu Fei Wu 26 0 0 20 Dec 2021
Improving Subgraph Recognition with Variational Graph Information Bottleneck Junchi Yu Jie Cao Ran He 22 53 0 18 Dec 2021
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition Haoran Sun Lantian Li T. Zheng Dong Wang CVBM 14 0 0 24 Nov 2021
Zero-shot Singing Technique Conversion Brendan T. O'Connor S. Dixon Georgy Fazekas 27 5 0 16 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning Shijun Wang Dimche Kostadinov Damian Borth 19 10 0 27 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion Zongyang Du Berrak Sisman Kun Zhou Haizhou Li 11 24 0 20 Oct 2021
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet Axel Roebel F. Bous 22 2 0 07 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu Jinxiang Chai Xun Cao 29 82 0 22 Sep 2021
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model Zhongwei Teng Quchen Fu Jules White Maria E. Powell Douglas C. Schmidt 13 5 0 06 Sep 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 24 15 0 17 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 23 3 0 21 Jun 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion Disong Wang Liqun Deng Y. Yeung Xiao Chen Xunying Liu H. Meng DRL 14 136 0 18 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 21 24 0 20 Apr 2021
Semi-supervised Learning for Singing Synthesis Timbre J. Bonada Merlijn Blaauw 19 4 0 05 Nov 2020
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization Yen-Hao Chen Da-Yi Wu Tsung-Han Wu Hung-yi Lee 23 107 0 31 Oct 2020
Graph Information Bottleneck for Subgraph Recognition Junchi Yu Tingyang Xu Yu Rong Yatao Bian Junzhou Huang R. He 17 153 0 12 Oct 2020
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations Janek Ebbers Michael Kuhlmann Tobias Cord-Landwehr Reinhold Haeb-Umbach DRL CoGe SSL 23 4 0 26 May 2020
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization Jen-Yu Liu Yu-Hua Chen Yin-Cheng Yeh Yi-Hsuan Yang GAN 32 35 0 18 May 2020