Extending Activation Steering to Broad Skills and Multiple Behaviours

Extending Activation Steering to Broad Skills and Multiple Behaviours

9 March 2024

Teun van der Weij

Papers citing "Extending Activation Steering to Broad Skills and Multiple Behaviours"

9 / 9 papers shown

Title
Towards Understanding Distilled Reasoning Models: A Representational Approach David D. Baek Max Tegmark LRM 75 2 0 05 Mar 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges Lukasz Bartoszcze Sarthak Munshi Bryan Sukidi Jennifer Yen Zejia Yang David Williams-King Linh Le Kosi Asuzu Carsten Maple 100 0 0 24 Feb 2025
Activation Steering in Neural Theorem Provers Shashank Kirtania LLMSV 135 0 0 21 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention Duy Nguyen Archiki Prasad Elias Stengel-Eskin Mohit Bansal LLMSV 110 0 0 18 Feb 2025
Improving Instruction-Following in Language Models through Activation Steering Alessandro Stolfo Vidhisha Balachandran Safoora Yousefi Eric Horvitz Besmira Nushi LLMSV 52 14 0 15 Oct 2024
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering Joris Postmus Steven Abreu LLMSV 88 1 0 09 Oct 2024
Analyzing the Generalization and Reliability of Steering Vectors Daniel Tan David Chanin Aengus Lynch Dimitrios Kanoulas Brooks Paige Adrià Garriga-Alonso Robert Kirk LLMSV 84 16 0 17 Jul 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering Yotam Wolf Noam Wies Dorin Shteyman Binyamin Rothberg Yoav Levine Amnon Shashua LLMSV 21 13 0 29 Jan 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 248 1,986 0 31 Dec 2020