Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.06813
Cited By
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
14 December 2019
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
Tomoki Toda
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining"
50 / 59 papers shown
Title
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
82
0
0
27 Apr 2025
A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
Haopeng Geng
Daisuke Saito
Nobuaki Minematsu
71
3
0
03 Oct 2024
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
Zixin Guo
Jian Zhang
116
1
0
24 Sep 2024
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations
Haopeng Geng
Daisuke Saito
Nobuaki Minematsu
83
0
0
18 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
120
2
0
14 Sep 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
78
1
0
20 Aug 2024
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
Arnav Goel
Medha Hira
Anubha Gupta
66
1
0
23 May 2024
Open vocabulary keyword spotting through transfer learning from speech synthesis
Kesavaraj V
A. Vuppala
75
3
0
05 Apr 2024
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
B. Halpern
Wen-Chin Huang
Lester Phillip Violeta
R.J.J.H. van Son
Tomoki Toda
125
2
0
04 Oct 2023
AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
84
4
0
14 Sep 2023
Parallel and Limited Data Voice Conversion Using Stochastic Variational Deep Kernel Learning
Mohamadreza Jafaryani
H. Sheikhzadeh
V. Pourahmadi
76
4
0
08 Sep 2023
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
Wen-Chin Huang
Tomoki Toda
CVBM
99
5
0
05 Sep 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
Tomoki Toda
118
50
0
26 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
115
30
0
09 Jun 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
169
48
0
21 Mar 2023
A Text-guided Protein Design Framework
Shengchao Liu
Yanjing Li
Zhuoxinran Li
A. Gitter
Yutao Zhu
...
Arvind Ramanathan
Chaowei Xiao
Jian Tang
Hongyu Guo
Anima Anandkumar
136
70
0
09 Feb 2023
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
Lester Phillip Violeta
D. Ma
Wen-Chin Huang
Tomoki Toda
75
8
0
02 Nov 2022
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion
D. Ma
Lester Phillip Violeta
Kazuhiro Kobayashi
Tomoki Toda
62
8
0
19 Oct 2022
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
87
47
0
11 Aug 2022
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
Magdalena Proszewska
Grzegorz Beringer
Daniel Sáez-Trigueros
Thomas Merritt
Abdelhamid Ezzerg
Roberto Barra-Chicote
70
6
0
04 Jul 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
DongDong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
107
49
0
03 May 2022
Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE
Ziang Long
Yunling Zheng
Meng Yu
Jack Xin
DRL
63
5
0
30 Mar 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
73
14
0
29 Mar 2022
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion
Damien Ronssin
Milos Cernak
78
11
0
12 Nov 2021
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Wen-Chin Huang
B. Halpern
Lester Phillip Violeta
O. Scharenborg
Tomoki Toda
106
23
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
85
63
0
15 Oct 2021
Multi-View Self-Attention Based Transformer for Speaker Recognition
Rui Wang
Junyi Ao
Long Zhou
Shujie Liu
Zhihua Wei
Tom Ko
Qing Li
Yu Zhang
ViT
71
32
0
11 Oct 2021
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion
Yi-Syuan Liou
Wen-Chin Huang
Ming-Chi Yen
S. Tsai
Yu-Huai Peng
Tomoki Toda
Yu Tsao
Hsin-Min Wang
82
1
0
08 Sep 2021
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
Huiyan Li
Haohong Lin
You Wang
Hengyang Wang
Ming Zhang
Han Gao
Qing Ai
Zhiyuan Luo
Guang Li
63
14
0
31 Jul 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
97
101
0
21 Jul 2021
On Prosody Modeling for ASR+TTS based Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Xinjian Li
Shinji Watanabe
Tomoki Toda
73
9
0
20 Jul 2021
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion
Wen-Chin Huang
Kazuhiro Kobayashi
Yu-Huai Peng
Ching-Feng Liu
Yu Tsao
Hsin-Min Wang
Tomoki Toda
72
11
0
02 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
127
180
0
31 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Jinyin Chen
Linhui Ye
Zhaoyan Ming
65
7
0
10 May 2021
SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Yi-Chen Chen
Po-Han Chi
Shu-Wen Yang
Kai-Wei Chang
Jheng-hao Lin
Sung-Feng Huang
Da-Rong Liu
Chi-Liang Liu
Cheng-Kuang Lee
Hung-yi Lee
MoE
64
17
0
07 May 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka
Kou Tanaka
Takuhiro Kaneko
81
21
0
14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
41
24
0
14 Apr 2021
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-Wook Kim
Seung-won Park
Junhyeok Lee
Myun-chul Joe
76
28
0
02 Apr 2021
crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder
Kazuhiro Kobayashi
Wen-Chin Huang
Yi-Chiao Wu
Patrick Lumban Tobing
Tomoki Hayashi
Tomoki Toda
BDL
DRL
68
19
0
04 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
202
323
0
25 Feb 2021
Axial Residual Networks for CycleGAN-based Voice Conversion
J. You
Gyuhyeon Nam
Dalhyun Kim
Gyeongsu Chae
62
3
0
16 Feb 2021
EMA2S: An End-to-End Multimodal Articulatory-to-Speech System
Yu-Wen Chen
Kuo-Hsuan Hung
Shang-Yi Chuang
Jonathan Sherman
Wen-Chin Huang
Xugang Lu
Yu Tsao
84
16
0
07 Feb 2021
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
108
38
0
23 Dec 2020
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information
Yen-Ju Lu
Chia-Yu Chang
Cheng Yu
Ching-Feng Liu
J. Hung
Shinji Watanabe
Yu Tsao
72
4
0
15 Nov 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention
Yist Y. Lin
C. Chien
Jheng-hao Lin
Hung-yi Lee
Lin-Shan Lee
60
79
0
27 Oct 2020
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations
Wen-Chin Huang
Yi-Chiao Wu
Tomoki Hayashi
Tomoki Toda
BDL
111
38
0
23 Oct 2020
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders
Wen-Chin Huang
Patrick Lumban Tobing
Yi-Chiao Wu
Kazuhiro Kobayashi
Tomoki Toda
86
8
0
09 Oct 2020
VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics
Hirokazu Kameoka
Takuhiro Kaneko
Kou Tanaka
Nobukatsu Hojo
Shogo Seki
DiffM
124
21
0
06 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Wen-Chin Huang
Tomoki Hayashi
Shinji Watanabe
Tomoki Toda
DRL
81
40
0
06 Oct 2020
Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion
Che-Jui Chang
65
5
0
30 Sep 2020
1
2
Next