Neural source-filter waveform models for statistical parametric speech synthesis

27 April 2019

Xin Wang

Papers citing "Neural source-filter waveform models for statistical parametric speech synthesis"

50 / 86 papers shown

Title
The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis Bernardo Torres Geoffroy Peeters G. Richard 41 0 0 06 May 2025
DOSE : Drum One-Shot Extraction from Music Mixture Suntae Hwang Seonghyeon Kang Kyungsu Kim Semin Ahn K. Lee 36 0 0 25 Apr 2025
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation Reo Yoneyama Atsushi Miyashita Ryuichi Yamamoto T. Toda 27 1 0 11 Nov 2024
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model Jianwei Cui Yu Gu Chao Weng Jie M. Zhang Liping Chen Lirong Dai 62 3 0 16 Oct 2024
HiFi-Glot: Neural Formant Synthesis with Differentiable Resonant Filters Lauri Juvela Pablo Pérez Zarazaga G. Henter Zofia Malisz 27 0 0 23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection Lam Pham Phat Lam Dat Tran Hieu Tang Tin Nguyen Alexander Schindler Canh Vu Alexander Polonsky Canh Vu 51 3 0 23 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild Jee-weon Jung Yihan Wu Xin Wang Ji-Hoon Kim Soumi Maiti ... Joon Son Chung Wangyou Zhang Seyun Um Shinnosuke Takamichi Shinji Watanabe 65 1 0 18 Sep 2024
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation Shihao Chen Yu Gu Jianwei Cui Jie Zhang Rilin Chen Lirong Dai 34 2 0 22 Aug 2024
Diff-MST: Differentiable Mixing Style Transfer Soumya Sai Vanka Christian Steinmetz Jean-Baptiste Rolland Joshua Reiss George Fazekas 23 5 0 11 Jul 2024
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation J. Lee Jaehyun Park Min Jun Choi Kyogu Lee 32 2 0 07 Jul 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning Masaya Kawamura Ryuichi Yamamoto Yuma Shirahata Takuya Hasumi Kentaro Tachibana VLM 27 5 0 12 Jun 2024
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis Hyunjae Cho Junhyeok Lee Wonbin Jung 18 0 0 10 Jun 2024
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance Shihao Chen Yu Gu Jie Zhang Na Li Rilin Chen Liping Chen Lirong Dai DiffM 40 6 0 08 Jun 2024
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality Tiantian Feng Xuan Shi Rahul Gupta Shrikanth S. Narayanan 41 0 0 27 Apr 2024
Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport Bernardo Torres Geoffroy Peeters Gaël Richard 35 4 0 22 Dec 2023
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion Binzhu Sha Xu Li Zhiyong Wu Yin Shan Helen M. Meng 21 7 0 08 Dec 2023
Learning to Solve Inverse Problems for Perceptual Sound Matching Han Han Vincent Lostanlen Mathieu Lagrange 21 3 0 23 Nov 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023 Yi-Hua Zhou Meng Chen Yi Lei Jihua Zhu Weifeng Zhao 16 5 0 08 Oct 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers Xintong Wang Chang Zeng Jun Chen Chunhui Wang 19 6 0 22 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Yinghao Aaron Li Cong Han Xilin Jiang N. Mesgarani 30 4 0 18 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions Reo Shimizu Ryuichi Yamamoto Masaya Kawamura Yuma Shirahata Hironori Doi Tatsuya Komatsu Kentaro Tachibana DiffM 16 19 0 15 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks Sizhou Chen Songyang Gao Sen Fang 19 0 0 14 Sep 2023
DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input Nicolas Jonason Xin Eric Wang Erica Cooper Lauri Juvela Bob L. T. Sturm Junichi Yamagishi 36 1 0 14 Sep 2023
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end? Xin Wang Junichi Yamagishi SyDa 50 23 0 12 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis B. Hayes Jordie Shier Gyorgy Fazekas Andrew Mcpherson C. Saitis 27 21 0 29 Aug 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature Review J. Barnett 16 25 0 07 Jul 2023
The Singing Voice Conversion Challenge 2023 Wen-Chin Huang Lester Phillip Violeta Songxiang Liu Jiatong Shi T. Toda 16 46 0 26 Jun 2023
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices O. Watts Lovisa Wihlborg Cassia Valentini-Botinhao 14 3 0 25 Nov 2022
Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems? Xuan Shi Erica Cooper Xin Wang Junichi Yamagishi Shrikanth Narayanan 25 1 0 25 Nov 2022
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System Takenori Yoshimura Shinji Takaki Kazuhiro Nakamura Keiichiro Oura Yukiya Hono Kei Hashimoto Yoshihiko Nankaku K. Tokuda 13 7 0 21 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 18 46 0 17 Nov 2022
Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing J. Webber Cassia Valentini-Botinhao Evelyn Williams G. Henter Simon King 11 9 0 13 Nov 2022
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit Ryuichi Yamamoto Reo Yoneyama T. Toda 134 11 0 28 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis Yuma Shirahata Ryuichi Yamamoto Eunwoo Song Ryo Terashima Jae-Min Kim Kentaro Tachibana 23 10 0 28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder Reo Yoneyama Yi-Chiao Wu T. Toda 41 26 0 27 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation Chunhui Wang Chang Zeng Jun Chen Xingji He 44 7 0 23 Oct 2022
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders Xin Wang Junichi Yamagishi 16 36 0 19 Oct 2022
Music Separation Enhancement with Generative Modeling N. Schaffer Boaz Cogan Ethan Manilow Max Morrison Prem Seetharaman Bryan Pardo 20 9 0 26 Aug 2022
Are disentangled representations all you need to build speaker anonymization systems? Pierre Champion D. Jouvet Anthony Larcher 22 20 0 22 Aug 2022
DDX7: Differentiable FM Synthesis of Musical Instrument Sounds Franco Caspe Andrew Mcpherson Mark Sandler 33 30 0 12 Aug 2022
DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation Da-Yi Wu Wen-Yi Hsiao Fu-Rong Yang Oscar D. Friedman Warren Jackson Scott Bruzenak Yi-Wen Liu Yi-Hsuan Yang DiffM 26 24 0 09 Aug 2022
Style Transfer of Audio Effects with Differentiable Signal Processing C. Steinmetz Nicholas J. Bryan Joshua D. Reiss 16 39 0 18 Jul 2022
Learning and controlling the source-filter representation of speech with a variational autoencoder Samir Sadok Simon Leglaive Laurent Girin Xavier Alameda-Pineda Renaud Séguier SSL DRL BDL 30 14 0 14 Apr 2022
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping Yuma Koizumi Heiga Zen Kohei Yatabe Nanxin Chen M. Bacchiani DiffM 27 45 0 31 Mar 2022
Differentially Private Speaker Anonymization Ali Shahin Shamsabadi B. M. L. Srivastava A. Bellet Nathalie Vauquier Emmanuel Vincent Mohamed Maouche Marc Tommasi Nicolas Papernot MIACV 38 32 0 23 Feb 2022
Audio representations for deep learning in sound synthesis: A review Anastasia Natsiou Seán O'Leary AI4TS 19 18 0 07 Jan 2022
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling Yusong Wu Ethan Manilow Yi Deng Rigel Swavely Kyle Kastner Tim Cooijmans Aaron Courville Cheng-Zhi Anna Huang Jesse Engel 26 44 0 17 Dec 2021
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis Antoine Caillon P. Esling DRL 17 109 0 09 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 129 123 0 04 Nov 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation Rongjie Huang Chenye Cui Feiyang Chen Yi Ren Jinglin Liu Zhou Zhao Baoxing Huai N. Yuan GAN 102 62 0 14 Oct 2021