ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.09283
11
1

Money on the Table: Statistical information ignored by Softmax can improve classifier accuracy

26 January 2019
Charles B. Delahunt
C. Mehanian
J. Nathan Kutz
ArXivPDFHTML
Abstract

Softmax is a standard final layer used in Neural Nets (NNs) to summarize information encoded in the trained NN and return a prediction. However, Softmax leverages only a subset of the class-specific structure encoded in the trained model and ignores potentially valuable information: During training, models encode an array DDD of class response distributions, where DijD_{ij}Dij​ is the distribution of the jthj^{th}jth pre-Softmax readout neuron's responses to the ithi^{th}ith class. Given a test sample, Softmax implicitly uses only the row of this array DDD that corresponds to the readout neurons' responses to the sample's true class. Leveraging more of this array DDD can improve classifier accuracy, because the likelihoods of two competing classes can be encoded in other rows of DDD. To explore this potential resource, we develop a hybrid classifier (Softmax-Pooling Hybrid, SPHSPHSPH) that uses Softmax on high-scoring samples, but on low-scoring samples uses a log-likelihood method that pools the information from the full array DDD. We apply SPHSPHSPH to models trained on a vectorized MNIST dataset to varying levels of accuracy. SPHSPHSPH replaces only the final Softmax layer in the trained NN, at test time only. All training is the same as for Softmax. Because the pooling classifier performs better than Softmax on low-scoring samples, SPHSPHSPH reduces test set error by 6% to 23%, using the exact same trained model, whatever the baseline Softmax accuracy. This reduction in error reflects hidden capacity of the trained NN that is left unused by Softmax.

View on arXiv
Comments on this paper