14
0

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Abstract

Achieving high-fidelity audio compression while preserving perceptual quality across diverse content remains a key challenge in Neural Audio Coding (NAC). We introduce MUFFIN, a fully convolutional Neural Psychoacoustic Coding (NPC) framework that leverages psychoacoustically guided multi-band frequency reconstruction. At its core is a Multi-Band Spectral Residual Vector Quantization (MBS-RVQ) module that allocates bitrate across frequency bands based on perceptual salience. This design enables efficient compression while disentangling speaker identity from content using distinct codebooks. MUFFIN incorporates a transformer-inspired convolutional backbone and a modified snake activation to enhance resolution in fine-grained spectral regions. Experimental results on multiple benchmarks demonstrate that MUFFIN consistently outperforms existing approaches in reconstruction quality. A high-compression variant achieves a state-of-the-art 12.5 Hz rate with minimal loss. MUFFIN also proves effective in downstream generative tasks, highlighting its promise as a token representation for integration with language models. Audio samples and code are available.

View on arXiv
@article{ng2025_2505.07235,
  title={ Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding },
  author={ Dianwen Ng and Kun Zhou and Yi-Wen Chao and Zhiwei Xiong and Bin Ma and Eng Siong Chng },
  journal={arXiv preprint arXiv:2505.07235},
  year={ 2025 }
}
Comments on this paper