Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.05767
Cited By
Extending Activation Steering to Broad Skills and Multiple Behaviours
9 March 2024
Teun van der Weij
Massimo Poesio
Nandi Schoots
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Extending Activation Steering to Broad Skills and Multiple Behaviours"
9 / 9 papers shown
Title
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
75
2
0
05 Mar 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
100
0
0
24 Feb 2025
Activation Steering in Neural Theorem Provers
Shashank Kirtania
LLMSV
135
0
0
21 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMSV
110
0
0
18 Feb 2025
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
52
14
0
15 Oct 2024
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus
Steven Abreu
LLMSV
88
1
0
09 Oct 2024
Analyzing the Generalization and Reliability of Steering Vectors
Daniel Tan
David Chanin
Aengus Lynch
Dimitrios Kanoulas
Brooks Paige
Adrià Garriga-Alonso
Robert Kirk
LLMSV
84
16
0
17 Jul 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering
Yotam Wolf
Noam Wies
Dorin Shteyman
Binyamin Rothberg
Yoav Levine
Amnon Shashua
LLMSV
21
13
0
29 Jan 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
1