ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02392
  4. Cited By
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

3 November 2021
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
    DRL
ArXivPDFHTML

Papers citing "A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion"

50 / 61 papers shown
Title
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
35
0
0
22 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
54
0
0
11 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
30
0
0
08 Apr 2025
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Keren Shao
K. Chen
Matthew Baas
Shlomo Dubnov
20
0
0
08 Apr 2025
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
Hedi Naouara
Jean-Pierre Lorré
Jérôme Louradour
49
0
0
03 Apr 2025
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters
Xuli Shen
Hua Cai
Dingding Yu
Weilin Shen
Qing-Song Xu
Xiangyang Xue
32
0
0
25 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
38
0
0
07 Mar 2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Jialong Zuo
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Xize Cheng
...
Wenrui Liu
Guangyan Zhang
Zehai Tu
Yiwen Guo
Zhou Zhao
49
0
0
08 Feb 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
...
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
50
5
0
28 Jan 2025
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Ünal Ege Gaznepoglu
Nils Peters
83
0
0
22 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
55
1
0
10 Jan 2025
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a
  Self-Supervised Learning Model
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
Joonyong Park
Daisuke Saito
N. Minematsu
67
0
0
04 Dec 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
  Any-to-One Voice Conversion
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
21
0
0
25 Sep 2024
Exploring synthetic data for cross-speaker style transfer in style
  representation based TTS
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas Ueda
Leonardo B. de M. M. Marques
Flávio O. Simões
Mário Uliani Neto
Fernando Runstein
Bianca Dal Bó
Paula D. P. Costa
21
0
0
25 Sep 2024
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Simon Malan
Benjamin van Niekerk
Herman Kamper
25
0
0
22 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice
  Conversion
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
23
0
0
17 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for
  Robust Singing Voice Conversion
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
37
0
0
10 Sep 2024
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
A. R. Bargum
Simon Lajboschitz
Cumhur Erkut
27
1
0
29 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
22
0
0
21 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
21
1
0
19 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
42
1
0
12 Aug 2024
Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Ying-Shuo Lee
Yueh-Po Peng
Jui-Te Wu
Ming Cheng
Li Su
Yi-Hsuan Yang
33
0
0
23 Jul 2024
A Preliminary Investigation on Flexible Singing Voice Synthesis Through
  Decomposed Framework with Inferrable Features
A Preliminary Investigation on Flexible Singing Voice Synthesis Through Decomposed Framework with Inferrable Features
Lester Phillip Violeta
Taketo Akama
27
0
0
12 Jul 2024
A Benchmark for Multi-speaker Anonymization
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
44
1
0
08 Jul 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
35
12
0
20 Jun 2024
End-to-end Streaming model for Low-Latency Speech Anonymization
End-to-end Streaming model for Low-Latency Speech Anonymization
Waris Quamer
Ricardo Gutierrez-Osuna
18
0
0
13 Jun 2024
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a
  Conditional Diffusion Model
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
Zongyang Du
Junchen Lu
Kun Zhou
Lakshmish Kaushik
Berrak Sisman
36
1
0
02 May 2024
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
24
0
0
01 May 2024
Learning Disentangled Speech Representations with Contrastive Learning
  and Time-Invariant Retrieval
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
Yimin Deng
Huaizhen Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
DRL
31
1
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
39
1
0
16 Jan 2024
StreamVC: Real-Time Low-Latency Voice Conversion
StreamVC: Real-Time Low-Latency Voice Conversion
Yang Yang
Y. Kartynnik
Yunpeng Li
Jiuqiang Tang
Xing Li
George Sung
Matthias Grundmann
28
12
0
05 Jan 2024
OpenVoice: Versatile Instant Voice Cloning
OpenVoice: Versatile Instant Voice Cloning
Zengyi Qin
Wenliang Zhao
Xumin Yu
Xin Sun
VLM
27
19
0
03 Dec 2023
Low-latency Real-time Voice Conversion on CPU
Low-latency Real-time Voice Conversion on CPU
Konstantine Sadov
Matthew Hutter
Asara Near
VLM
23
1
0
01 Nov 2023
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and
  Textually Described Voices
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Matthew Baas
Herman Kamper
13
3
0
12 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
19
7
0
08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing
  for SVCC2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023
Yi-Hua Zhou
Meng Chen
Yi Lei
Jihua Zhu
Weifeng Zhao
16
5
0
08 Oct 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen An Li
Tsung-Yuan Hsu
T. Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
18
5
0
25 Sep 2023
Electrolaryngeal Speech Intelligibility Enhancement Through Robust
  Linguistic Encoders
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta
Wen-Chin Huang
D. Ma
Ryuichi Yamamoto
Kazuhiro Kobayashi
T. Toda
14
3
0
18 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
26
2
0
06 Sep 2023
FSD: An Initial Chinese Dataset for Fake Song Detection
FSD: An Initial Chinese Dataset for Fake Song Detection
Yuankun Xie
Jingjing Zhou
Xiaolin Lu
Zhenghao Jiang
Yuxin Yang
Haonan Cheng
Long Ye
24
14
0
05 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice
  Conversion by Multi-scale Style Modeling
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
13
4
0
03 Sep 2023
Vocoder drift compensation by x-vector alignment in speaker
  anonymisation
Vocoder drift compensation by x-vector alignment in speaker anonymisation
Michele Panariello
Massimiliano Todisco
Nicholas W. D. Evans
27
2
0
17 Jul 2023
Rhythm Modeling for Voice Conversion
Rhythm Modeling for Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Herman Kamper
29
5
0
12 Jul 2023
Disentanglement in a GAN for Unconditional Speech Synthesis
Disentanglement in a GAN for Unconditional Speech Synthesis
Matthew Baas
Herman Kamper
DiffM
24
2
0
04 Jul 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision
  through Self-Supervised Discrete Speech Units
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu
Berrak Sisman
Mingyang Zhang
Haizhou Li
24
4
0
29 Jun 2023
The Singing Voice Conversion Challenge 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
16
46
0
26 Jun 2023
Zero-Shot Automatic Pronunciation Assessment
Zero-Shot Automatic Pronunciation Assessment
Hongfu Liu
Mingqiang Shi
Ye Wang
19
4
0
31 May 2023
Voice Conversion With Just Nearest Neighbors
Voice Conversion With Just Nearest Neighbors
Matthew Baas
Benjamin van Niekerk
Herman Kamper
SSL
32
48
0
30 May 2023
Speaker anonymization using orthogonal Householder neural network
Speaker anonymization using orthogonal Householder neural network
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
BDL
19
18
0
30 May 2023
12
Next