ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02584
50
0

Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning

3 June 2025
Sarenne Wallbridge
Christoph Minixhofer
Catherine Lai
P. Bell
    SSL
ArXiv (abs)PDFHTML
Main:4 Pages
1 Figures
Bibliography:1 Pages
1 Tables
Abstract

People exploit the predictability of lexical structures during text comprehension. Though predictable structure is also present in speech, the degree to which prosody, e.g. intonation, tempo, and loudness, contributes to such structure independently of the lexical content is unclear. This study leverages self-supervised learning (SSL) to examine the temporal granularity of structures in the acoustic correlates of prosody. Representations from our proposed Masked Prosody Model can predict perceptual labels dependent on local information, such as word boundaries, but provide the most value for labels involving longer-term structures, like emotion recognition. Probing experiments across various perceptual labels show strong relative gains over untransformed pitch, energy, and voice activity features. Our results reveal the importance of SSL training objective timescale and highlight the value of complex SSL-encoded structures compared to more constrained classical structures.

View on arXiv
@article{wallbridge2025_2506.02584,
  title={ Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning },
  author={ Sarenne Wallbridge and Christoph Minixhofer and Catherine Lai and Peter Bell },
  journal={arXiv preprint arXiv:2506.02584},
  year={ 2025 }
}
Comments on this paper