ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.11296
  4. Cited By

Steering Language Model Refusal with Sparse Autoencoders

18 November 2024
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
    LLMSV
ArXivPDFHTML

Papers citing "Steering Language Model Refusal with Sparse Autoencoders"

1 / 1 papers shown
Title
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context
  Learning Tasks
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Madeline Brumley
Joe Kwon
David M. Krueger
Dmitrii Krasheninnikov
Usman Anwar
LLMSV
21
3
0
11 Nov 2024
1