Title
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Haitong Zhang Yue Lin 10 0 0 14 Oct 2021
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet Axel Roebel F. Bous 22 2 0 07 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks E. Hortal Rodrigo Brechard Alarcia GAN 16 2 0 06 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis Cheng-I Jeff Lai Erica Cooper Yang Zhang Shiyu Chang Kaizhi Qian ... Yung-Sung Chuang Alexander H. Liu Junichi Yamagishi David D. Cox James R. Glass 26 6 0 04 Oct 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Ye Jia Michelle Tadmor Ramanovich Tal Remez Roi Pomerantz 26 67 0 19 Jul 2021
SoundStream: An End-to-End Neural Audio Codec Neil Zeghidour Alejandro Luebs Ahmed Omran Jan Skoglund Marco Tagliasacchi AI4TS 23 721 0 07 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures Prateek Verma C. Chafe 10 28 0 30 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis Jinhyeok Yang Jaesung Bae Taejun Bak Young-Ik Kim Hoon-Young Cho 20 36 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 21 3 0 21 Jun 2021
A deep generative model for probabilistic energy forecasting in power systems: normalizing flows Jonathan Dumas Antoine Wehenkel Bertrand Cornélusse Antonio Sutera AI4TS 24 81 0 17 Jun 2021
Learning to Compensate: A Deep Neural Network Framework for 5G Power Amplifier Compensation Po-Yu Chen Hao-Wei Chen Yi-Min Tsai Hsien-Kai Kuo Hantao Huang Hsin-Hung Chen Sheng-Hong Yan Wei-Lun Ou Chia-Ming Cheng 26 3 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 12 7 0 14 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example Gal Greshler Tamar Rott Shaham T. Michaeli 16 25 0 11 Jun 2021
CaloFlow: Fast and Accurate Generation of Calorimeter Showers with Normalizing Flows Claudius Krause David Shih AI4CE 15 81 0 09 Jun 2021
Marginalizable Density Models D. Gilboa Ari Pakman Thibault Vatter BDL 32 5 0 08 Jun 2021
TrTr: Visual Tracking with Transformer Moju Zhao K. Okada Masayuki Inaba ViT 20 79 0 09 May 2021
Instances as Queries Yuxin Fang Shusheng Yang Xinggang Wang Yu Li Chen Fang Ying Shan Bin Feng Wenyu Liu ISeg 42 255 0 05 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 13 24 0 20 Apr 2021
Knowledge Distillation as Semiparametric Inference Tri Dao G. Kamath Vasilis Syrgkanis Lester W. Mackey 22 31 0 20 Apr 2021
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning Narendra Chaudhary Sanchit Misra Dhiraj D. Kalamkar A. Heinecke E. Georganas Barukh Ziv Menachem Adelman Bharat Kaul 24 9 0 16 Apr 2021
Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN Cong Wang Yu Chen Bin Wang Yi Shi 21 1 0 26 Mar 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models Sam Bond-Taylor Adam Leach Yang Long Chris G. Willcocks VLM TPM 36 478 0 08 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention Peng Liu Yuewen Cao Songxiang Liu Na Hu Guangzhi Li Chao Weng Dan Su 28 22 0 12 Feb 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade Jiatao Gu X. Kong 17 135 0 31 Dec 2020
End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study Matteo Lionello Hendrik Purwins 20 0 0 19 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 11 97 0 06 Nov 2020
NU-GAN: High resolution neural upsampling with GAN Rithesh Kumar Kundan Kumar Vicki Anand Yoshua Bengio Aaron Courville 11 25 0 22 Oct 2020
Neural Approximate Sufficient Statistics for Implicit Models Yanzhi Chen Dinghuai Zhang Michael U. Gutmann Aaron Courville Zhanxing Zhu 24 79 0 20 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis Zhifeng Kong Wei Ping Jiaji Huang Kexin Zhao Bryan Catanzaro DiffM BDL 20 1,387 0 21 Sep 2020
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems Ravichander Vipperla Sangjun Park Kihyun Choo Samin S. Ishtiaq Kyoungbo Min S. Bhattacharya Abhinav Mehrotra Alberto Gil C. P. Ramos Nicholas D. Lane 8 26 0 11 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 17 73 0 04 Aug 2020
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network Yi-Chiao Wu Tomoki Hayashi Patrick Lumban Tobing Kazuhiro Kobayashi T. Toda 11 18 0 11 Jul 2020
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement Luka Chkhetiani Levan Bejanidze 9 1 0 13 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction Adrian Lañcucki 20 332 0 11 Jun 2020
Learning normalizing flows from Entropy-Kantorovich potentials Chris Finlay Augusto Gerolin Adam M. Oberman Aram-Alexandre Pooladian 20 23 0 10 Jun 2020
Deep generative models for musical audio synthesis M. Huzaifah L. Wyse 19 20 0 10 Jun 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 19 2,835 0 09 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 27 1,354 0 08 Jun 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 62 12,660 0 26 May 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 22 473 0 22 May 2020
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation Yi-Chiao Wu Tomoki Hayashi T. Okamoto Hisashi Kawai T. Toda 13 4 0 18 May 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 17 30 0 18 May 2020
FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction Qiao Tian Zewang Zhang Heng Lu Linghui Chen Shan Liu 14 22 0 12 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech Geng Yang Shan Yang Kai-Chun Liu Peng Fang Wei-Neng Chen Lei Xie 42 198 0 11 May 2020
FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis Aman Sinha Matthew O'Kelly Hongrui Zheng Rahul Mangharam John C. Duchi Russ Tedrake OffRL 66 26 0 09 Mar 2020
Deterministic Decoding for Discrete Data in Variational Autoencoders Daniil Polykovskiy Dmitry Vetrov OffRL 16 8 0 04 Mar 2020
Predictive Sampling with Forecasting Autoregressive Models Auke Wiggers Emiel Hoogeboom BDL 25 16 0 23 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming William Chan Chitwan Saharia Geoffrey E. Hinton Mohammad Norouzi Navdeep Jaitly BDL AI4TS 16 114 0 20 Feb 2020
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models Chin-Wei Huang Laurent Dinh Aaron Courville DRL 24 87 0 17 Feb 2020