ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05767
  4. Cited By
Extending Activation Steering to Broad Skills and Multiple Behaviours

Extending Activation Steering to Broad Skills and Multiple Behaviours

9 March 2024
Teun van der Weij
Massimo Poesio
Nandi Schoots
    LLMSV
ArXivPDFHTML

Papers citing "Extending Activation Steering to Broad Skills and Multiple Behaviours"

9 / 9 papers shown
Title
Towards Understanding Distilled Reasoning Models: A Representational Approach
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
75
2
0
05 Mar 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
100
0
0
24 Feb 2025
Activation Steering in Neural Theorem Provers
Activation Steering in Neural Theorem Provers
Shashank Kirtania
LLMSV
135
0
0
21 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Multi-Attribute Steering of Language Models via Targeted Intervention
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMSV
110
0
0
18 Feb 2025
Improving Instruction-Following in Language Models through Activation Steering
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
52
14
0
15 Oct 2024
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus
Steven Abreu
LLMSV
88
1
0
09 Oct 2024
Analyzing the Generalization and Reliability of Steering Vectors
Analyzing the Generalization and Reliability of Steering Vectors
Daniel Tan
David Chanin
Aengus Lynch
Dimitrios Kanoulas
Brooks Paige
Adrià Garriga-Alonso
Robert Kirk
LLMSV
84
16
0
17 Jul 2024
Tradeoffs Between Alignment and Helpfulness in Language Models with
  Representation Engineering
Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering
Yotam Wolf
Noam Wies
Dorin Shteyman
Binyamin Rothberg
Yoav Levine
Amnon Shashua
LLMSV
21
13
0
29 Jan 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
1