Representation Engineering for Large-Language Models: Survey and Research Challenges

24 February 2025

Abstract

Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.

View on arXiv

@article{bartoszcze2025_2502.17601,
  title={ Representation Engineering for Large-Language Models: Survey and Research Challenges },
  author={ Lukasz Bartoszcze and Sarthak Munshi and Bryan Sukidi and Jennifer Yen and Zejia Yang and David Williams-King and Linh Le and Kosi Asuzu and Carsten Maple },
  journal={arXiv preprint arXiv:2502.17601},
  year={ 2025 }
}

Comments on this paper