ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.15518
  4. Cited By
Steering Without Side Effects: Improving Post-Deployment Control of
  Language Models

Steering Without Side Effects: Improving Post-Deployment Control of Language Models

21 June 2024
Asa Cooper Stickland
Alexander Lyzhov
Jacob Pfau
Salsabila Mahdi
Samuel R. Bowman
    LLMSV
    AAML
ArXivPDFHTML

Papers citing "Steering Without Side Effects: Improving Post-Deployment Control of Language Models"

6 / 6 papers shown
Title
Evaluating the Prompt Steerability of Large Language Models
Evaluating the Prompt Steerability of Large Language Models
Erik Miehling
Michael Desmond
K. Ramamurthy
Elizabeth M. Daly
Pierre L. Dognin
Jesus Rios
Djallel Bouneffouf
Miao Liu
LLMSV
85
3
0
19 Nov 2024
Improving Instruction-Following in Language Models through Activation Steering
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
40
13
0
15 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
42
9
0
30 Sep 2024
Representation Tuning
Representation Tuning
Christopher M. Ackerman
LLMSV
19
0
0
11 Sep 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
87
13
0
06 Sep 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in
  Chain-of-Thought
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua
Edward Rees
Hunar Batra
Samuel R. Bowman
Julian Michael
Ethan Perez
Miles Turpin
LRM
30
13
0
08 Mar 2024
1