ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01104
  4. Cited By
softmax is not enough (for sharp out-of-distribution)

softmax is not enough (for sharp out-of-distribution)

1 October 2024
Petar Veličković
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
ArXivPDFHTML

Papers citing "softmax is not enough (for sharp out-of-distribution)"

11 / 11 papers shown
Title
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
Diep Luong
Mikko Heikkinen
K. Drossos
Tuomas Virtanen
34
0
0
06 May 2025
Bayesian Principles Improve Prompt Learning In Vision-Language Models
Bayesian Principles Improve Prompt Learning In Vision-Language Models
Mingyu Kim
Jongwoo Ko
Mijung Park
VLM
28
0
0
19 Apr 2025
Long Context In-Context Compression by Getting to the Gist of Gisting
Long Context In-Context Compression by Getting to the Gist of Gisting
Aleksandar Petrov
Mark Sandler
A. Zhmoginov
Nolan Miller
Max Vladymyrov
17
0
0
11 Apr 2025
On Vanishing Variance in Transformer Length Generalization
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
26
0
0
03 Apr 2025
Multi-Token Attention
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
40
1
0
01 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
37
0
0
29 Mar 2025
Attend or Perish: Benchmarking Attention in Algorithmic Reasoning
Michal Spiegel
Michal Štefánik
Marek Kadlcík
Josef Kuchař
29
0
0
28 Feb 2025
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
Jakub Binkowski
Denis Janiak
Albert Sawczyn
Bogdan Gabrys
Tomasz Kajdanowicz
50
0
0
24 Feb 2025
What makes a good feedforward computational graph?
What makes a good feedforward computational graph?
Alex Vitvitskyi
J. G. Araújo
Marc Lackenby
Petar Velickovic
71
1
0
10 Feb 2025
Round and Round We Go! What makes Rotary Positional Encodings useful?
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
47
16
0
08 Oct 2024
MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under
  Distribution Shifts
MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts
Renchunzi Xie
Ambroise Odonnat
Vasilii Feofanov
Weijian Deng
Jianfeng Zhang
Bo An
28
0
0
29 May 2024
1