softmax is not enough (for sharp out-of-distribution)

softmax is not enough (for sharp out-of-distribution)

1 October 2024

Petar Veličković

Christos Perivolaropoulos

Federico Barbero

Papers citing "softmax is not enough (for sharp out-of-distribution)"

11 / 11 papers shown

Title
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance Diep Luong Mikko Heikkinen K. Drossos Tuomas Virtanen 34 0 0 06 May 2025
Bayesian Principles Improve Prompt Learning In Vision-Language Models Mingyu Kim Jongwoo Ko Mijung Park VLM 28 0 0 19 Apr 2025
Long Context In-Context Compression by Getting to the Gist of Gisting Aleksandar Petrov Mark Sandler A. Zhmoginov Nolan Miller Max Vladymyrov 17 0 0 11 Apr 2025
On Vanishing Variance in Transformer Length Generalization Ruining Li Gabrijel Boduljak Jensen Zhou 26 0 0 03 Apr 2025
Multi-Token Attention O. Yu. Golovneva Tianlu Wang Jason Weston Sainbayar Sukhbaatar 40 1 0 01 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 37 0 0 29 Mar 2025
Attend or Perish: Benchmarking Attention in Algorithmic Reasoning Michal Spiegel Michal Štefánik Marek Kadlcík Josef Kuchař 29 0 0 28 Feb 2025
Hallucination Detection in LLMs Using Spectral Features of Attention Maps Jakub Binkowski Denis Janiak Albert Sawczyn Bogdan Gabrys Tomasz Kajdanowicz 50 0 0 24 Feb 2025
What makes a good feedforward computational graph? Alex Vitvitskyi J. G. Araújo Marc Lackenby Petar Velickovic 71 1 0 10 Feb 2025
Round and Round We Go! What makes Rotary Positional Encodings useful? Federico Barbero Alex Vitvitskyi Christos Perivolaropoulos Razvan Pascanu Petar Velickovic 47 16 0 08 Oct 2024
MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts Renchunzi Xie Ambroise Odonnat Vasilii Feofanov Weijian Deng Jianfeng Zhang Bo An 28 0 0 29 May 2024