ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.00198
  4. Cited By
Transforming Spectrum and Prosody for Emotional Voice Conversion with
  Non-Parallel Training Data
v1v2v3v4v5 (latest)

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

The Speaker and Language Recognition Workshop (Odyssey), 2020
1 February 2020
Kun Zhou
Berrak Sisman
Haizhou Li
ArXiv (abs)PDFHTML

Papers citing "Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data"

43 / 43 papers shown
Randomness from causally independent processes
Randomness from causally independent processes
Martin Sandfuchs
Carla Ferradini
R. Renner
CML
219
0
0
06 Oct 2025
Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
Textless and Non-Parallel Speech-to-Speech Emotion Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
317
0
0
23 May 2025
A Review of Human Emotion Synthesis Based on Generative Technology
A Review of Human Emotion Synthesis Based on Generative Technology
Fei Ma
Yongqian Li
Yifan Xie
Y. He
Yujiao Shi
...
Z. Liu
Wei Yao
Fuji Ren
Fei Richard Yu
Shiguang Ni
318
16
0
10 Dec 2024
Improving speaker verification robustness with synthetic emotional utterances
Nikhil Kumar Koditala
C. Ju
Ruirui Li
Minho Jin
Vasu Sharma
A. Stolcke
327
0
0
30 Nov 2024
Re-ENACT: Reinforcement Learning for Emotional Speech Generation using
  Actor-Critic Strategy
Re-ENACT: Reinforcement Learning for Emotional Speech Generation using Actor-Critic Strategy
Ravi Shankar
Archana Venkataraman
203
2
0
04 Aug 2024
Fine-Grained Quantitative Emotion Editing for Speech Generation
Fine-Grained Quantitative Emotion Editing for Speech Generation
Sho Inoue
Kun Zhou
Shuai Wang
Haizhou Li
277
5
0
04 Mar 2024
Attention-based Interactive Disentangling Network for Instance-level
  Emotional Voice Conversion
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice ConversionInterspeech (Interspeech), 2023
Yun Chen
Lingxiao Yang
Qi Chen
Jianhuang Lai
Xiaohua Xie
179
7
0
29 Dec 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice ConversionAutomatic Speech Recognition & Understanding (ASRU), 2023
Chun-Yi Kuan
Chen-An Li
Tsung-Yuan Hsu
Tzu-Quan Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
378
13
0
25 Sep 2023
EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel
  and In-the-wild Data
EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
N. Prabhu
Bunlong Lay
Simon Welker
N. Lehmann-Willenbrock
Timo Gerkmann
DiffM
383
10
0
14 Sep 2023
In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised
  Representations and Neural Vocoder-based Resynthesis
In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
N. Prabhu
N. Lehmann-Willenbrock
Timo Gerkmann
231
4
0
02 Jun 2023
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs
  Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain PairingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Nirmesh J. Shah
M. Singh
Naoya Takahashi
N. Onoe
238
20
0
21 Feb 2023
Emotion Selectable End-to-End Text-based Speech Editing
Emotion Selectable End-to-End Text-based Speech EditingArtificial Intelligence (AI), 2022
Tao Wang
Jiangyan Yi
Ruibo Fu
Jianhua Tao
Zhengqi Wen
Chu Yuan Zhang
225
5
0
20 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech ReconstructionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
280
17
0
14 Dec 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style
  Transfer
Improving Speech Emotion Recognition with Unsupervised Speaking Style TransferIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
285
11
0
16 Nov 2022
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
EmoFake: An Initial Dataset for Emotion Fake Audio DetectionChina National Conference on Chinese Computational Linguistics (CCL), 2022
Yan Zhao
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xiaohui Zhang
Yongfeng Dong
217
24
0
10 Nov 2022
A Diffeomorphic Flow-based Variational Framework for Multi-speaker
  Emotion Conversion
A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion ConversionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Ravi Shankar
Hsi-Wei Hsieh
N. Charon
A. Venkataraman
DRL
199
2
0
09 Nov 2022
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice ConversionThe Speaker and Language Recognition Workshop (Odyssey), 2022
Kun Zhou
Berrak Sisman
John H. L. Hansen
Bin Ma
Haizhou Li
356
6
0
25 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep
  Learning Era
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning EraProceedings of the IEEE (Proc. IEEE), 2022
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
309
97
0
06 Oct 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
369
67
0
11 Aug 2022
Low-data? No problem: low-resource, language-agnostic conversational
  text-to-speech via F0-conditioned data augmentation
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentationInterspeech (Interspeech), 2022
Giulia Comini
Goeric Huybrechts
M. Ribeiro
Adam Gabry's
Jaime Lorenzo-Trueba
207
7
0
29 Jul 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for
  End-to-End Speech Synthesis
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech SynthesisInterspeech (Interspeech), 2022
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
264
18
0
04 Jul 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice
  Conversion
An Overview & Analysis of Sequence-to-Sequence Emotional Voice ConversionInterspeech (Interspeech), 2022
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
263
18
0
29 Mar 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Emotion Intensity and its Control for Emotional Voice ConversionIEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
432
82
0
10 Jan 2022
CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model
  with Transformer
CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer
Changzeng Fu
Chaoran Liu
C. Ishi
H. Ishiguro
ViT
167
13
0
30 Nov 2021
Textless Speech Emotion Conversion using Discrete and Decomposed
  Representations
Textless Speech Emotion Conversion using Discrete and Decomposed RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Felix Kreuk
Adam Polyak
Jade Copet
Eugene Kharitonov
Tu Nguyen
M. Rivière
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
398
47
0
14 Nov 2021
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Wen-Chin Huang
B. Halpern
Lester Phillip Violeta
O. Scharenborg
Tomoki Toda
326
29
0
15 Oct 2021
Decoupling Speaker-Independent Emotions for Voice Conversion Via
  Source-Filter Networks
Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks
Zhaojie Luo
Shoufeng Lin
Rui Liu
Jun Baba
Yuichiro Yoshikawa
H. Ishiguro
185
13
0
04 Oct 2021
Expressive Voice Conversion: A Joint Framework for Speaker Identity and
  Emotional Style Transfer
Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style TransferAutomatic Speech Recognition & Understanding (ASRU), 2021
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
331
25
0
08 Jul 2021
Global Rhythm Style Transfer Without Text Transcriptions
Global Rhythm Style Transfer Without Text Transcriptions
Kaizhi Qian
Yang Zhang
Shiyu Chang
Jinjun Xiong
Chuang Gan
David D. Cox
M. Hasegawa-Johnson
279
21
0
16 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD
Emotional Voice Conversion: Theory, Databases and ESDSpeech Communication (Speech Commun.), 2021
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
534
264
0
31 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
MASS: Multi-task Anthropomorphic Speech Synthesis FrameworkComputer Speech and Language (CSL), 2021
Jinyin Chen
Linhui Ye
Zhaoyan Ming
154
7
0
10 May 2021
Towards end-to-end F0 voice conversion based on Dual-GAN with
  convolutional wavelet kernels
Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernelsEuropean Signal Processing Conference (EUSIPCO), 2021
Clément Le Moine Veillon
Nicolas Obin
Axel Roebel
160
8
0
15 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech:
  Two-stage Sequence-to-Sequence Training
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence TrainingInterspeech (Interspeech), 2021
Kun Zhou
Berrak Sisman
Haizhou Li
407
35
0
31 Mar 2021
EmoCat: Language-agnostic Emotional Voice Conversion
EmoCat: Language-agnostic Emotional Voice ConversionSpeech Synthesis Workshop (SSW), 2021
Bastian Schnell
Goeric Huybrechts
Bartek Perz
Thomas Drugman
Jaime Lorenzo-Trueba
233
11
0
14 Jan 2021
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in
  Speech
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Kun Zhou
Berrak Sisman
Haizhou Li
DRL
384
47
0
03 Nov 2020
Seen and Unseen emotional style transfer for voice conversion with a new
  emotional speech dataset
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech datasetIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
376
257
0
28 Oct 2020
Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with
  CycleGAN
Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGANAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020
Zongyang Du
Kun Zhou
Berrak Sisman
Haizhou Li
318
8
0
11 Aug 2020
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data
VAW-GAN for Singing Voice Conversion with Non-parallel Training DataAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020
Junchen Lu
Kun Zhou
Berrak Sisman
Haizhou Li
DRL
208
21
0
10 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep LearningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
643
413
0
09 Aug 2020
Multi-speaker Emotion Conversion via Latent Variable Regularization and
  a Chained Encoder-Decoder-Predictor Network
Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor NetworkInterspeech (Interspeech), 2020
Ravi Shankar
Hsi-Wei Hsieh
N. Charon
A. Venkataraman
249
11
0
25 Jul 2020
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network
  and an Adversarial Pair Discriminator
Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair DiscriminatorInterspeech (Interspeech), 2020
Ravi Shankar
Jacob Sager
A. Venkataraman
GAN
298
19
0
25 Jul 2020
Incorporating Reinforced Adversarial Learning in Autoregressive Image
  Generation
Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation
Kenan E. Ak
N. Xu
Zhe Lin
Yilin Wang
250
14
0
20 Jul 2020
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice
  Conversion
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion
Kun Zhou
Berrak Sisman
Mingyang Zhang
Haizhou Li
360
62
0
13 May 2020
1
Page 1 of 1