v1v2 (latest)

Aligning Large Language Models with Representation Editing: A Control Perspective

Neural Information Processing Systems (NeurIPS), 2024

10 June 2024

Chao Zhang

Papers citing "Aligning Large Language Models with Representation Editing: A Control Perspective"

23 / 23 papers shown

Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space

101

30 Oct 2025

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

138

15 Oct 2025

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

100

14 Oct 2025

The Idola Tribus of AI: Large Language Models tend to perceive order where none exists

104

10 Oct 2025

Activation Steering with a Feedback Controller

191

05 Oct 2025

Preemptive Detection and Steering of LLM Misalignment via Latent Reachability

Sathwik Karnik

Somil Bansal

LLMSV

134

25 Sep 2025

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

152

24 Sep 2025

The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations

123

16 Sep 2025

Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries

132

25 Aug 2025

MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search

207

19 Aug 2025

LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation

157

10 Jun 2025

Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

204

09 Jun 2025

Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures

Heng-Sheng Chang

P. Mehta

291

01 May 2025

Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

263

13 Mar 2025

Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models

Andy Zhou

MoMe

349

13 Mar 2025

Personalize Your LLM: Fake it then Align itNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

503

02 Mar 2025

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

346

28 Feb 2025

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMsInternational Conference on Learning Representations (ICLR), 2025

615

26 Feb 2025

Is Free Self-Alignment Possible?

426

24 Feb 2025

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

235

24 Feb 2025

Mixture of Attentions For Speculative DecodingInternational Conference on Learning Representations (ICLR), 2024

Matthieu Zimmer

Milan Gritta

Gerasimos Lampouras

Haitham Bou Ammar

Jun Wang

330

04 Oct 2024

Programming Refusal with Conditional Activation SteeringInternational Conference on Learning Representations (ICLR), 2024

Bruce W. Lee

Inkit Padhi

Karthikeyan N. Ramamurthy

499

06 Sep 2024

Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons

333

20 Jun 2024