Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.11772
Cited By
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models
17 May 2025
Ryan Chen
Youngmin Ko
Zeyu Zhang
Catherine Cho
Sunny Chung
Mauro Giuffré
Dennis L. Shung
Bradly C. Stadie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models"
21 / 21 papers shown
Title
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen
Yixin Wu
Y. Qu
Michael Backes
Savvas Zannettou
Yang Zhang
69
4
0
28 Jan 2025
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An
Sicheng Zhu
Ruiyi Zhang
Michael-Andrei Panaitescu-Liess
Yuancheng Xu
Furong Huang
AAML
71
15
0
01 Sep 2024
On the Origins of Linear Representations in Large Language Models
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
Victor Veitch
76
29
0
06 Mar 2024
On Measuring Faithfulness or Self-consistency of Natural Language Explanations
Letitia Parcalabescu
Anette Frank
LRM
84
24
0
13 Nov 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
80
162
0
07 Nov 2023
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru
Chirag Agarwal
Himabindu Lakkaraju
LRM
41
14
0
06 Nov 2023
Linear Representations of Sentiment in Large Language Models
Curt Tigges
Oskar John Hollinsworth
Atticus Geiger
Neel Nanda
MILM
19
85
0
23 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
112
199
0
10 Oct 2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Neel Nanda
Andrew Lee
Martin Wattenberg
FAtt
MILM
66
165
0
02 Sep 2023
Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness
Jiuhai Chen
Jonas W. Mueller
72
63
0
30 Aug 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRM
AI4CE
LM&Ro
83
654
0
18 Aug 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&Ro
LRM
AI4CE
106
1,850
0
17 May 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
63
414
0
07 May 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
471
3,486
0
21 Mar 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAG
LRM
114
1,438
0
25 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
572
9,009
0
28 Jan 2022
Do Prompt-Based Models Really Understand the Meaning of their Prompts?
Albert Webson
Ellie Pavlick
LRM
84
361
0
02 Sep 2021
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
Alon Jacovi
Yoav Goldberg
XAI
68
588
0
07 Apr 2020
A Unified Approach to Interpreting Model Predictions
Scott M. Lundberg
Su-In Lee
FAtt
538
21,613
0
22 May 2017
Rationalizing Neural Predictions
Tao Lei
Regina Barzilay
Tommi Jaakkola
87
809
0
13 Jun 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
FaML
582
16,828
0
16 Feb 2016
1