v1v2v3 (latest)

Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers

20 August 2025

ArXiv (abs)PDF HTML Github (52202★)

Main:8 Pages

12 Figures

Bibliography:2 Pages

5 Tables

Appendix:4 Pages

Abstract

While Large Language models' abilities for in-context learning (ICL) have had much success, they have limitations on simple semantic tasks involving quantifiers like {\em every} and {\em some}, as well as on tasks with linear functions. We analyze those limitations and identify Softmax, the scoring function in the attention mechanism, as a contributing factor to these limitations. Our \textbf{scaled signed averaging (SSA)}, a novel scoring function mitigates these limitations. SSA significantly improves performance on our ICL tasks. In addition, SSA outperforms transformer models with Softmax on several early learning NLP benchmarks and linguistic probing tasks on zero and few-shot settings.

View on arXiv

Comments on this paper