ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.12405
77
3

Evaluating the Prompt Steerability of Large Language Models

19 November 2024
Erik Miehling
Michael Desmond
K. Ramamurthy
Elizabeth M. Daly
Pierre L. Dognin
Jesus Rios
Djallel Bouneffouf
Miao Liu
    LLMSV
ArXivPDFHTML
Abstract

Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model's joint behavioral distribution can be shifted from its baseline. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited -- due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark atthis https URL.

View on arXiv
@article{miehling2025_2411.12405,
  title={ Evaluating the Prompt Steerability of Large Language Models },
  author={ Erik Miehling and Michael Desmond and Karthikeyan Natesan Ramamurthy and Elizabeth M. Daly and Pierre Dognin and Jesus Rios and Djallel Bouneffouf and Miao Liu },
  journal={arXiv preprint arXiv:2411.12405},
  year={ 2025 }
}
Comments on this paper