NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

17 November 2022

Hyeongju Kim

Papers citing "NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis"

39 / 39 papers shown

Title
Voice Cloning: Comprehensive Survey Hussam Azzuni Abdulmotaleb El Saddik VLM 32 0 0 01 May 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System H. Kim Jinhyeok Yang Yechan Yu Seunghun Ji Jacob Morton Frederik Bous Joon Byun Juheon Lee 46 0 0 29 Mar 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis Yingahao Aaron Li Rithesh Kumar Zeyu Jin DiffM 88 0 0 21 Feb 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation Ji-Hoon Kim Hong-Sun Yang Yoon-Cheol Ju Il-Hwan Kim Byeong-Yeol Kim Joon Son Chung BDL 42 0 0 31 Dec 2024
Disentangling Textual and Acoustic Features of Neural Speech Representations Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema A. Alishahi Ivan Titov CoGe 21 0 0 03 Oct 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control Ryuichi Yamamoto Yuma Shirahata Masaya Kawamura Kentaro Tachibana DiffM 24 2 0 26 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Yinghao Aaron Li Xilin Jiang Cong Han N. Mesgarani DiffM 29 4 0 16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection Xuanru Zhou Cheol Jun Cho Ayati Sharma Brittany Morin D. Baquirin ... Zachary Miller B. Tee M. G. Tempini Jiachen Lian Gopala Anumanchipalli 20 3 0 15 Sep 2024
Super Monotonic Alignment Search Junhyeok Lee Hyeongju Kim 16 0 0 12 Sep 2024
User-Driven Voice Generation and Editing through Latent Space Navigation Yusheng Tian Junbin Liu Tan Lee DiffM 27 2 0 30 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling Jiachen Lian Xuanru Zhou Z. Ezzes Jet M J Vonk Brittany Morin D. Baquirin Zachary Mille M. G. Tempini Gopala Anumanchipalli AuLLM 30 1 0 29 Aug 2024
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection Xuanru Zhou Anshul Kashyap Steve Li Ayati Sharma Brittany Morin ... Z. Ezzes Zachary Miller M. G. Tempini Jiachen Lian Gopala Krishna Anumanchipalli 16 6 0 27 Aug 2024
Hear Your Face: Face-based voice conversion with F0 estimation Jaejun Lee Yoori Oh Injune Hwang Kyogu Lee CVBM 16 1 0 19 Aug 2024
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation J. Lee Jaehyun Park Min Jun Choi Kyogu Lee 19 1 0 07 Jul 2024
Fine-Grained and Interpretable Neural Speech Editing Max Morrison Cameron Churchwell Nathan Pruyne Bryan Pardo 39 3 0 07 Jul 2024
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework Hokuto Munakata Ryo Terashima Yusuke Fujita 23 0 0 24 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics Cheol Jun Cho Peter Wu Tejas S. Prabhune Dhruv Agarwal Gopala K. Anumanchipalli 24 1 0 18 Jun 2024
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis Hyunjae Cho Junhyeok Lee Wonbin Jung 16 0 0 10 Jun 2024
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation Yimin Deng Jianzong Wang Xulong Zhang Ning Cheng Jing Xiao 16 0 0 01 May 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing Philip Anastassiou Zhenyu Tang Kainan Peng Dongya Jia Jiaxin Li Ming Tu Yuping Wang Yuxuan Wang Mingbo Ma 34 4 0 10 Apr 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Zeqian Ju Yuancheng Wang Kai Shen Xu Tan Detai Xin ... Shikun Zhang Jiang Bian Lei He Jinyu Li Sheng Zhao DiffM 22 139 0 05 Mar 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction Minchan Kim Myeonghun Jeong Byoung Jin Choi Semin Kim Joun Yeop Lee Nam Soo Kim AI4TS 13 1 0 03 Jan 2024
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Sang-Hoon Lee Haram Choi Seung-Bin Kim Seong-Whan Lee BDL 17 31 0 21 Nov 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis Jungil Kong Junmo Lee Jeongmin Kim Beomjeong Kim Jihoon Park Dohee Kong Changheon Lee Sangjin Kim 13 1 0 20 Nov 2023
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion A. R. Bargum Stefania Serafin Cumhur Erkut 13 3 0 14 Nov 2023
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations Paarth Neekhara Shehzeen Samarah Hussain Rafael Valle Boris Ginsburg Rishabh Ranjan Shlomo Dubnov F. Koushanfar Julian McAuley 13 1 0 14 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023 Ryuichi Yamamoto Reo Yoneyama Lester Phillip Violeta Wen-Chin Huang T. Toda 8 7 0 08 Oct 2023
VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice F. Bous Axel Roebel 11 0 0 05 Oct 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Yinghao Aaron Li Cong Han Xilin Jiang N. Mesgarani 27 4 0 18 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis B. Hayes Jordie Shier Gyorgy Fazekas Andrew Mcpherson C. Saitis 8 21 0 29 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer Sang-Hoon Lee Haram Choi H. Oh Seong-Whan Lee BDL 21 9 0 30 Jul 2023
DisCover: Disentangled Music Representation Learning for Cover Song Identification Jiahao Xun Shengyu Zhang Yanting Yang Jieming Zhu Liqun Deng Zhou Zhao Zhenhua Dong Ruiqi Li Lichao Zhang Fei Wu AAML DRL 15 5 0 19 Jul 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation Rongjie Huang Chunlei Zhang Yongqiang Wang Dongchao Yang Lu Liu Zhenhui Ye Ziyue Jiang Chao Weng Zhou Zhao Dong Yu DiffM 21 22 0 30 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment Ruiqi Li Rongjie Huang Lichao Zhang Jinglin Liu Zhou Zhao 20 4 0 08 May 2023
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph Dacheng Yin Xuanchi Ren Chong Luo Yuwang Wang Zhiwei Xiong Wenjun Zeng 39 13 0 24 Feb 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 166 372 0 04 Dec 2021
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net Hyeong-Seok Choi Sungjin Park Jie Hwan Lee Hoon Heo Dongsuk Jeon Kyogu Lee 31 57 0 05 Feb 2021
DDSP: Differentiable Digital Signal Processing Jesse Engel Lamtharn Hantrakul Chenjie Gu Adam Roberts DiffM 83 366 0 14 Jan 2020
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 210 239 0 25 Sep 2019