ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.10706
24
6
v1v2v3 (latest)

Deep Network Perceptual Losses for Speech Denoising

21 November 2020
Mark R. Saddler
Andrew Francl
J. Feather
Kaizhi Qian
Yang Zhang
Josh H. McDermott
ArXiv (abs)PDFHTML
Abstract

Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform. Here we investigate whether deep feature representations learned for audio classification tasks can be used to improve denoising. We first trained deep neural networks to classify either spoken words or environmental sounds from audio. We then trained an audio transform to map noisy speech to an audio waveform that minimized 'perceptual' losses derived from the recognition network. When the transform was trained to minimize the difference in the deep feature representations between the output audio and the corresponding clean audio, it removed noise substantially better than baseline methods trained to reconstruct clean waveforms. The learned deep features were essential for this improvement, as features from untrained networks with random weights did not provide the same benefit. The results suggest the use of deep features as perceptual metrics to guide speech enhancement.

View on arXiv
Comments on this paper