ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09407
  4. Cited By
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

17 November 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
ArXivPDFHTML

Papers citing "NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis"

39 / 39 papers shown
Title
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
32
0
0
01 May 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
H. Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
46
0
0
29 Mar 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
88
0
0
21 Feb 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
42
0
0
31 Dec 2024
Disentangling Textual and Acoustic Features of Neural Speech
  Representations
Disentangling Textual and Acoustic Features of Neural Speech Representations
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
A. Alishahi
Ivan Titov
CoGe
21
0
0
03 Oct 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice
  Control
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
24
2
0
26 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou
Cheol Jun Cho
Ayati Sharma
Brittany Morin
D. Baquirin
...
Zachary Miller
B. Tee
M. G. Tempini
Jiachen Lian
Gopala Anumanchipalli
20
3
0
15 Sep 2024
Super Monotonic Alignment Search
Super Monotonic Alignment Search
Junhyeok Lee
Hyeongju Kim
16
0
0
12 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation
User-Driven Voice Generation and Editing through Latent Space Navigation
Yusheng Tian
Junbin Liu
Tan Lee
DiffM
27
2
0
30 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
30
1
0
29 Aug 2024
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
Xuanru Zhou
Anshul Kashyap
Steve Li
Ayati Sharma
Brittany Morin
...
Z. Ezzes
Zachary Miller
M. G. Tempini
Jiachen Lian
Gopala Krishna Anumanchipalli
16
6
0
27 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee
Yoori Oh
Injune Hwang
Kyogu Lee
CVBM
16
1
0
19 Aug 2024
Differentiable Modal Synthesis for Physical Modeling of Planar String
  Sound and Motion Simulation
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
J. Lee
Jaehyun Park
Min Jun Choi
Kyogu Lee
19
1
0
07 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
39
3
0
07 Jul 2024
Song Data Cleansing for End-to-End Neural Singer Diarization Using
  Neural Analysis and Synthesis Framework
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Hokuto Munakata
Ryo Terashima
Yusuke Fujita
23
0
0
24 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
24
1
0
18 Jun 2024
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
Hyunjae Cho
Junhyeok Lee
Wonbin Jung
16
0
0
10 Jun 2024
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
16
0
0
01 May 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
34
4
0
10 Apr 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
22
139
0
05 Mar 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic
  Token Prediction
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
13
1
0
03 Jan 2024
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
17
31
0
21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech
  Synthesis
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
13
1
0
20 Nov 2023
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice
  Conversion
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion
A. R. Bargum
Stefania Serafin
Cumhur Erkut
13
3
0
14 Nov 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
13
1
0
14 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
8
7
0
08 Oct 2023
VaSAB: The variable size adaptive information bottleneck for
  disentanglement on speech and singing voice
VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice
F. Bous
Axel Roebel
11
0
0
05 Oct 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
27
4
0
18 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech
  Synthesis
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
8
21
0
29 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
21
9
0
30 Jul 2023
DisCover: Disentangled Music Representation Learning for Cover Song
  Identification
DisCover: Disentangled Music Representation Learning for Cover Song Identification
Jiahao Xun
Shengyu Zhang
Yanting Yang
Jieming Zhu
Liqun Deng
Zhou Zhao
Zhenhua Dong
Ruiqi Li
Lichao Zhang
Fei Wu
AAML
DRL
15
5
0
19 Jul 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
21
22
0
30 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
20
4
0
08 May 2023
Retriever: Learning Content-Style Representation as a Token-Level
  Bipartite Graph
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Dacheng Yin
Xuanchi Ren
Chong Luo
Yuwang Wang
Zhiwei Xiong
Wenjun Zeng
39
13
0
24 Feb 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
166
372
0
04 Dec 2021
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
Hyeong-Seok Choi
Sungjin Park
Jie Hwan Lee
Hoon Heo
Dongsuk Jeon
Kyogu Lee
31
57
0
05 Feb 2021
DDSP: Differentiable Digital Signal Processing
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
83
366
0
14 Jan 2020
High Fidelity Speech Synthesis with Adversarial Networks
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
210
239
0
25 Sep 2019
1