Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.11296
Cited By
Steering Language Model Refusal with Sparse Autoencoders
18 November 2024
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Steering Language Model Refusal with Sparse Autoencoders"
1 / 1 papers shown
Title
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Madeline Brumley
Joe Kwon
David M. Krueger
Dmitrii Krasheninnikov
Usman Anwar
LLMSV
21
3
0
11 Nov 2024
1