ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.12088
26
0

AttentionDrop: A Novel Regularization Method for Transformer Models

16 April 2025
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
ArXivPDFHTML
Abstract

Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. We propose AttentionDrop, a unified family of stochastic regularization techniques that operate directly on the self-attention distributions. We introduces three variants: 1. Hard Attention Masking: randomly zeroes out top-k attention logits per query to encourage diverse context utilization. 2. Blurred Attention Smoothing: applies a dynamic Gaussian convolution over attention logits to diffuse overly peaked distributions. 3. Consistency-Regularized AttentionDrop: enforces output stability under multiple independent AttentionDrop perturbations via a KL-based consistency loss.

View on arXiv
@article{baig2025_2504.12088,
  title={ AttentionDrop: A Novel Regularization Method for Transformer Models },
  author={ Mirza Samad Ahmed Baig and Syeda Anshrah Gillani and Abdul Akbar Khan and Shahid Munir Shah },
  journal={arXiv preprint arXiv:2504.12088},
  year={ 2025 }
}
Comments on this paper