ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.10575
16
1

Symbolic Autoencoding for Self-Supervised Sequence Learning

16 February 2024
Mohammad Hossein Amani
Nicolas Mario Baldwin
Amin Mansouri
Martin Josifoski
Maxime Peyrard
Robert West
ArXivPDFHTML
Abstract

Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce. Addressing this issue, we introduce \textit{symbolic autoencoding} (Σ\SigmaΣAE), a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data. Σ\SigmaΣAE connects two generative models via a discrete bottleneck layer and is optimized end-to-end by minimizing reconstruction loss (simultaneously with supervised loss for the parallel data), such that the sequence generated by the discrete bottleneck can be read out as the transduced input sequence. We also develop gradient-based methods allowing for efficient self-supervised sequence learning despite the discreteness of the bottleneck. Our results demonstrate that Σ\SigmaΣAE significantly enhances performance on transduction tasks, even with minimal parallel data, offering a promising solution for weakly supervised learning scenarios.

View on arXiv
Comments on this paper