55
0

Revisiting Gradient Descent: A Dual-Weight Method for Improved Learning

Abstract

We introduce a novel framework for learning in neural networks by decomposing each neuron's weight vector into two distinct parts, W1W_1 and W2W_2, thereby modeling contrastive information directly at the neuron level. Traditional gradient descent stores both positive (target) and negative (non-target) feature information in a single weight vector, often obscuring fine-grained distinctions. Our approach, by contrast, maintains separate updates for target and non-target features, ultimately forming a single effective weight W=W1W2W = W_1 - W_2 that is more robust to noise and class imbalance. Experimental results on both regression (California Housing, Wine Quality) and classification (MNIST, Fashion-MNIST, CIFAR-10) tasks suggest that this decomposition enhances generalization and resists overfitting, especially when training data are sparse or noisy. Crucially, the inference complexity remains the same as in the standard WX+biasWX + \text{bias} setup, offering a practical solution for improved learning without additional inference-time overhead.

View on arXiv
@article{wang2025_2503.11965,
  title={ Revisiting Gradient Descent: A Dual-Weight Method for Improved Learning },
  author={ Xi Wang },
  journal={arXiv preprint arXiv:2503.11965},
  year={ 2025 }
}
Comments on this paper