ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.19200
  4. Cited By
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

27 July 2024
Nitay Calderon
Roi Reichart
ArXivPDFHTML

Papers citing "On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs"

21 / 21 papers shown
Title
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton Razzhigaev
Matvey Mikhalchuk
Temurbek Rahmatullaev
Elizaveta Goncharova
Polina Druzhinina
Ivan V. Oseledets
Andrey Kuznetsov
55
1
0
20 Feb 2025
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive
  Study
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Van Bach Nguyen
Paul Youssef
Jorg Schlotterer
Christin Seifert
37
14
0
26 Apr 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to
  components
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
41
42
0
01 Mar 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
68
61
0
26 Feb 2024
Attention Lens: A Tool for Mechanistically Interpreting the Attention
  Head Information Retrieval Mechanism
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Mansi Sakarvadia
Arham Khan
Aswathy Ajith
Daniel Grzenda
Nathaniel Hudson
André Bauer
Kyle Chard
Ian T. Foster
172
9
0
25 Oct 2023
Attribution Patching Outperforms Automated Circuit Discovery
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
55
53
0
16 Oct 2023
Can LLMs facilitate interpretation of pre-trained language models?
Can LLMs facilitate interpretation of pre-trained language models?
Basel Mousi
Nadir Durrani
Fahim Dalvi
36
12
0
22 May 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation
Tanay Dixit
Bhargavi Paranjape
Hannaneh Hajishirzi
Luke Zettlemoyer
SyDa
130
23
0
10 Oct 2022
Causal Proxy Models for Concept-Based Model Explanations
Causal Proxy Models for Concept-Based Model Explanations
Zhengxuan Wu
Karel DÓosterlinck
Atticus Geiger
Amir Zur
Christopher Potts
MILM
68
35
0
28 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey
Towards Faithful Model Explanation in NLP: A Survey
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
104
105
0
22 Sep 2022
Naturalistic Causal Probing for Morpho-Syntax
Naturalistic Causal Probing for Morpho-Syntax
Afra Amini
Tiago Pimentel
Clara Meister
Ryan Cotterell
MILM
98
18
0
14 May 2022
Tailor: Generating and Perturbing Text with Semantic Controls
Tailor: Generating and Perturbing Text with Semantic Controls
Alexis Ross
Tongshuang Wu
Hao Peng
Matthew E. Peters
Matt Gardner
124
77
0
15 Jul 2021
Gradient-based Adversarial Attacks against Text Transformers
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
98
225
0
15 Apr 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
402
0
24 Feb 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Mohit Bansal
Christopher Ré
AAML
OffRL
OOD
138
136
0
13 Jan 2021
Topic Modeling with Contextualized Word Representation Clusters
Topic Modeling with Contextualized Word Representation Clusters
Laure Thompson
David M. Mimno
94
82
0
23 Oct 2020
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Chih-Kuan Yeh
Been Kim
Sercan Ö. Arik
Chun-Liang Li
Tomas Pfister
Pradeep Ravikumar
FAtt
120
293
0
17 Oct 2019
What you can cram into a single vector: Probing sentence embeddings for
  linguistic properties
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
199
876
0
03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Towards A Rigorous Science of Interpretable Machine Learning
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
225
3,658
0
28 Feb 2017
1