ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17760
  4. Cited By
But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors
v1v2 (latest)

But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors

23 May 2025
Leon Eshuijs
Archie Chaudhury
Alan McBeth
Ethan Nguyen
    LLMSV
ArXiv (abs)PDFHTMLGithub

Papers citing "But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors"

8 / 8 papers shown
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context
  Learning Tasks
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Madeline Brumley
Joe Kwon
David M. Krueger
Dmitrii Krasheninnikov
Usman Anwar
LLMSV
212
14
0
11 Nov 2024
Improving Steering Vectors by Targeting Sparse Autoencoder Features
Improving Steering Vectors by Targeting Sparse Autoencoder Features
Sviatoslav Chalnev
Matthew Siu
Arthur Conmy
LLMSV
373
48
0
04 Nov 2024
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive ProgressionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Robert D Morabito
Sangmitra Madhusudan
Tyler McDonald
Ali Emami
286
3
0
20 Sep 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLMMoEOSLM
623
1,583
0
31 Jul 2024
Managing extreme AI risks amid rapid progress
Managing extreme AI risks amid rapid progress
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
...
Juil Sock
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
351
30
0
26 Oct 2023
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written EvaluationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
359
601
0
19 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without SupervisionInternational Conference on Learning Representations (ICLR), 2022
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
417
542
0
07 Dec 2022
Generating Informative and Diverse Conversational Responses via
  Adversarial Information Maximization
Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization
Yizhe Zhang
Michel Galley
Jianfeng Gao
Zhe Gan
Xiujun Li
Chris Brockett
W. Dolan
394
310
0
16 Sep 2018
1
Page 1 of 1