ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16845
41
0

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

22 May 2025
Hanglei Zhang
Yiwei Guo
Zhihan Li
Xiang Hao
Xie Chen
Kai Yu
ArXiv (abs)PDFHTML
Main:4 Pages
2 Figures
Bibliography:1 Pages
1 Tables
Abstract

Most neural speech codecs achieve bitrate adjustment through intra-frame mechanisms, such as codebook dropout, at a Constant Frame Rate (CFR). However, speech segments inherently have time-varying information density (e.g., silent intervals versus voiced regions). This property makes CFR not optimal in terms of bitrate and token sequence length, hindering efficiency in real-time applications. In this work, we propose a Temporally Flexible Coding (TFC) technique, introducing variable frame rate (VFR) into neural speech codecs for the first time. TFC enables seamlessly tunable average frame rates and dynamically allocates frame rates based on temporal entropy. Experimental results show that a codec with TFC achieves optimal reconstruction quality with high flexibility, and maintains competitive performance even at lower frame rates. Our approach is promising for the integration with other efforts to develop low-frame-rate neural speech codecs for more efficient downstream tasks.

View on arXiv
@article{zhang2025_2505.16845,
  title={ Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate },
  author={ Hanglei Zhang and Yiwei Guo and Zhihan Li and Xiang Hao and Xie Chen and Kai Yu },
  journal={arXiv preprint arXiv:2505.16845},
  year={ 2025 }
}
Comments on this paper