ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06719
  4. Cited By
Fast Spectrogram Inversion using Multi-head Convolutional Neural
  Networks
v1v2 (latest)

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

20 August 2018
Sercan O. Arik
Heewoo Jun
G. Diamos
ArXiv (abs)PDFHTML

Papers citing "Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks"

50 / 55 papers shown
Title
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Andres Fernandez
Juan Azcarreta
Cagdas Bilen
Jesus Monge Alvarez
41
0
0
30 May 2025
Representation of perceived prosodic similarity of conversational feedback
Representation of perceived prosodic similarity of conversational feedback
Livia Qian
Carol Figueroa
Gabriel Skantze
41
0
0
19 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
443
1
0
07 May 2025
Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers
Frequency-Aware Masked Autoencoders for Human Activity Recognition using Accelerometers
Niels R. Lorenzen
P. Jennum
Emmanuel Mignot
A. Brink-Kjaer
80
0
0
17 Feb 2025
Synthesizer Sound Matching Using Audio Spectrogram Transformers
Synthesizer Sound Matching Using Audio Spectrogram Transformers
Fred Bruford
Frederik Blang
S. Nercessian
45
1
0
23 Jul 2024
SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
Yuhang He
Shitong Xu
Jia-Xing Zhong
Sangyun Shin
Niki Trigoni
Andrew Markham
78
0
0
16 Jun 2024
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement
  network with knowledge distillation and complex axial self-attention
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
Mingshuai Liu
Zhuangqi Chen
Xiaopeng Yan
Yuanjun Lv
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
78
4
0
11 Jun 2024
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto
Sascha Hornauer
Fabien Moutarde
145
2
0
28 May 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
144
3
0
08 Mar 2024
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
Mingshuai Liu
Zhuangqi Chen
Xiaopeng Yan
Yuanjun Lv
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
86
5
0
09 Jan 2024
Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and
  Spectral Optimal Transport
Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport
Bernardo Torres
Geoffroy Peeters
Gaël Richard
91
6
0
22 Dec 2023
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic
  Control Using Multi-Objective Learning
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning
Xincheng Yu
Dongyue Guo
Jianwei Zhang
Yi Lin
56
3
0
11 Dec 2023
A High Fidelity and Low Complexity Neural Audio Coding
A High Fidelity and Low Complexity Neural Audio Coding
Wenzhe Liu
Wei Xiao
Meng Wang
Shan Yang
Yupeng Shi
Yuyong Kang
Dan Su
Shidong Shang
Dong Yu
48
2
0
17 Oct 2023
A Review of Differentiable Digital Signal Processing for Music & Speech
  Synthesis
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
83
25
0
29 Aug 2023
Multi-Loss Convolutional Network with Time-Frequency Attention for
  Speech Enhancement
Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement
Liang Wan
Hongqing Liu
Yi Zhou
Jie Ji
63
2
0
15 Jun 2023
Gesper: A Restoration-Enhancement Framework for General Speech
  Reconstruction
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction
Wenzhe Liu
Yupeng Shi
Jun Chen
Wei Rao
Shulin He
Andong Li
Yannan Wang
Zhiyong Wu
54
6
0
14 Jun 2023
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
Doyeon Kim
Soo-Whan Chung
Hyewon Han
Youna Ji
Hong-Goo Kang
71
7
0
02 Jun 2023
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Kinyugo Maina
85
7
0
16 Jan 2023
Audio Language Modeling using Perceptually-Guided Discrete
  Representations
Audio Language Modeling using Perceptually-Guided Discrete Representations
Felix Kreuk
Yaniv Taigman
Adam Polyak
Jade Copet
Gabriel Synnaeve
Alexandre Défossez
Yossi Adi
85
4
0
02 Nov 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary
  Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
90
7
0
23 Oct 2022
Parallel Synthesis for Autoregressive Speech Generation
Parallel Synthesis for Autoregressive Speech Generation
Po-Chun Hsu
Da-Rong Liu
Andy T. Liu
Hung-yi Lee
80
5
0
25 Apr 2022
Learning the Proximity Operator in Unfolded ADMM for Phase Retrieval
Learning the Proximity Operator in Unfolded ADMM for Phase Retrieval
Pierre-Hugo Vial
P. Magron
Thomas Oberlin
Cédric Févotte
28
3
0
04 Apr 2022
On loss functions and evaluation metrics for music source separation
On loss functions and evaluation metrics for music source separation
Enric Gusó
Jordi Pons
Santiago Pascual
Joan Serrà
136
21
0
16 Feb 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
  Accelerators
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
82
4
0
04 Feb 2022
Audio representations for deep learning in sound synthesis: A review
Audio representations for deep learning in sound synthesis: A review
Anastasia Natsiou
Seán O'Leary
AI4TS
72
18
0
07 Jan 2022
Adversarial Auto-Encoding for Packet Loss Concealment
Adversarial Auto-Encoding for Packet Loss Concealment
Santiago Pascual
Joan Serrà
Jordi Pons
71
29
0
07 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Gal Greshler
Tamar Rott Shaham
T. Michaeli
102
25
0
11 Jun 2021
Inspect, Understand, Overcome: A Survey of Practical Methods for AI
  Safety
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety
Sebastian Houben
Stephanie Abrecht
Maram Akila
Andreas Bär
Felix Brockherde
...
Serin Varghese
Michael Weber
Sebastian J. Wirkert
Tim Wirtz
Matthias Woehrle
AAML
130
58
0
29 Apr 2021
Universal Neural Vocoding with Parallel WaveNet
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
81
42
0
01 Feb 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram
  loss
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Eunwoo Song
Ryuichi Yamamoto
Min-Jae Hwang
Jin-Seob Kim
Ohsung Kwon
Jae-Min Kim
71
14
0
19 Jan 2021
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via
  Adversarial Training
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training
Haohan Guo
Heng Lu
Na Hu
Chunlei Zhang
Shan Yang
Lei Xie
Jane Polak Scowcroft
Dong Yu
AAML
68
12
0
03 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on
  Location-Variable Convolution
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
44
8
0
03 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
99
101
0
06 Nov 2020
Parallel waveform synthesis based on generative adversarial networks
  with voicing-aware conditional discriminators
Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators
Ryuichi Yamamoto
Eunwoo Song
Min-Jae Hwang
Jae-Min Kim
74
18
0
27 Oct 2020
Hierarchical Timbre-Painting and Articulation Generation
Hierarchical Timbre-Painting and Articulation Generation
Michael Michelashvili
Lior Wolf
86
12
0
30 Aug 2020
Unsupervised Cross-Domain Singing Voice Conversion
Unsupervised Cross-Domain Singing Voice Conversion
Adam Polyak
Lior Wolf
Yossi Adi
Yaniv Taigman
58
44
0
06 Aug 2020
PPSpeech: Phrase based Parallel End-to-End TTS System
PPSpeech: Phrase based Parallel End-to-End TTS System
Yahuan Cong
Ran Zhang
Jian Luan
45
3
0
06 Aug 2020
End-to-End Adversarial Text-to-Speech
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
85
187
0
05 Jun 2020
WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU
WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU
Po-Chun Hsu
Hung-yi Lee
44
16
0
15 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
70
32
0
12 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality
  Text-to-Speech
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
153
200
0
11 May 2020
Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
176
758
0
30 Apr 2020
Single Channel Speech Enhancement Using Temporal Convolutional Recurrent
  Neural Networks
Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks
Jingdong Li
Hui Zhang
Xueliang Zhang
Changliang Li
53
9
0
02 Feb 2020
DDSP: Differentiable Digital Signal Processing
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
188
381
0
14 Jan 2020
Parallel WaveGAN: A fast waveform generation model based on generative
  adversarial networks with multi-resolution spectrogram
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
195
821
0
25 Oct 2019
Neural Drum Machine : An Interactive System for Real-time Synthesis of
  Drum Sounds
Neural Drum Machine : An Interactive System for Real-time Synthesis of Drum Sounds
Cyran Aouameur
P. Esling
Gaëtan Hadjeres
42
22
0
04 Jul 2019
MelNet: A Generative Model for Audio in the Frequency Domain
MelNet: A Generative Model for Audio in the Frequency Domain
Sean Vasquez
M. Lewis
DiffM
85
132
0
04 Jun 2019
Non-Autoregressive Neural Text-to-Speech
Non-Autoregressive Neural Text-to-Speech
Kainan Peng
Ming-Yu Liu
Z. Song
Kexin Zhao
101
40
0
21 May 2019
Assisted Sound Sample Generation with Musical Conditioning in
  Adversarial Auto-Encoders
Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders
Adrien Bitton
P. Esling
Antoine Caillon
Martin Fouilleul
73
10
0
12 Apr 2019
12
Next