ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.02911
30
0

Noiser: Bounded Input Perturbations for Attributing Large Language Models

3 April 2025
Mohammad Reza Ghasemi Madani
Aryo Pradipta Gema
Gabriele Sarti
Yu Zhao
Pasquale Minervini
Andrea Passerini
    AAML
ArXivPDFHTML
Abstract

Feature attribution (FA) methods are common post-hoc approaches that explain how Large Language Models (LLMs) make predictions. Accordingly, generating faithful attributions that reflect the actual inner behavior of the model is crucial. In this paper, we introduce Noiser, a perturbation-based FA method that imposes bounded noise on each input embedding and measures the robustness of the model against partially noised input to obtain the input attributions. Additionally, we propose an answerability metric that employs an instructed judge model to assess the extent to which highly scored tokens suffice to recover the predicted output. Through a comprehensive evaluation across six LLMs and three tasks, we demonstrate that Noiser consistently outperforms existing gradient-based, attention-based, and perturbation-based FA methods in terms of both faithfulness and answerability, making it a robust and effective approach for explaining language model predictions.

View on arXiv
@article{madani2025_2504.02911,
  title={ Noiser: Bounded Input Perturbations for Attributing Large Language Models },
  author={ Mohammad Reza Ghasemi Madani and Aryo Pradipta Gema and Gabriele Sarti and Yu Zhao and Pasquale Minervini and Andrea Passerini },
  journal={arXiv preprint arXiv:2504.02911},
  year={ 2025 }
}
Comments on this paper