ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.05954
  4. Cited By
Aligning Large Language Models with Representation Editing: A Control
  Perspective
v1v2 (latest)

Aligning Large Language Models with Representation Editing: A Control Perspective

Neural Information Processing Systems (NeurIPS), 2024
10 June 2024
Lingkai Kong
Haorui Wang
Wenhao Mu
Yuanqi Du
Yuchen Zhuang
Yifei Zhou
Yue Song
Rongzhi Zhang
Kai Wang
Chao Zhang
ArXiv (abs)PDFHTML

Papers citing "Aligning Large Language Models with Representation Editing: A Control Perspective"

23 / 23 papers shown
Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
Sekitoshi Kanai
Tsukasa Yoshida
Hiroshi Takahashi
Haru Kuroki
Kazumune Hashimoto
101
0
0
30 Oct 2025
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
Ravi Pandya
Madison Bland
D. Nguyen
Changliu Liu
J. F. Fisac
Andrea V. Bajcsy
138
1
0
15 Oct 2025
Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing
Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing
Rongzhi Zhang
Meghaj Tarte
Yuzhao Heng
Xiang Chen
Tong Yu
Lingkai Kong
Sudheer Chava
Chao Zhang
100
0
0
14 Oct 2025
The Idola Tribus of AI: Large Language Models tend to perceive order where none exists
The Idola Tribus of AI: Large Language Models tend to perceive order where none exists
Shin-nosuke Ishikawa
Masato Todo
Taiki Ogihara
Hirotsugu Ohba
LRM
104
0
0
10 Oct 2025
Activation Steering with a Feedback Controller
Activation Steering with a Feedback Controller
Dung V. Nguyen
Hieu M. Vu
Nhi Y. Pham
Lei Zhang
T. Nguyen
LLMSV
191
0
0
05 Oct 2025
Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
Sathwik Karnik
Somil Bansal
LLMSV
134
2
0
25 Sep 2025
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Yisong Xiao
Aishan Liu
Siyuan Liang
Zonghao Ying
Xianglong Liu
Dacheng Tao
KELM
152
2
0
24 Sep 2025
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations
Yubo Zhu
Dongrui Liu
Zecheng Lin
Wei Tong
Sheng Zhong
Jing Shao
123
2
0
16 Sep 2025
Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries
Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries
Meiling Ning
Zhongbao Zhang
Junda Ye
Jiabao Guo
Qingyuan Guan
LRM
132
0
0
25 Aug 2025
MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search
MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search
Jeremy Carleton
Debajoy Mukherjee
Srinivas Shakkottai
D. Kalathil
207
1
0
19 Aug 2025
LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation
LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation
Xinyuan Wang
Haoyue Bai
Nanxu Gong
Wangyang Ying
Sixun Dong
Xiquan Cui
Yanjie Fu
157
4
0
10 Jun 2025
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Feifan Song
Shaohang Wei
Wen Luo
Yuxuan Fan
Tianyu Liu
Guoyin Wang
Houfeng Wang
204
4
0
09 Jun 2025
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Heng-Sheng Chang
P. Mehta
291
1
0
01 May 2025
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
263
0
0
13 Mar 2025
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Andy Zhou
MoMe
349
2
0
13 Mar 2025
Personalize Your LLM: Fake it then Align itNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yijing Zhang
Dyah Adila
Changho Shin
Frederic Sala
503
6
0
02 Mar 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAMLLLMSV
346
8
0
28 Feb 2025
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMsInternational Conference on Learning Representations (ICLR), 2025
Zhaowei Zhang
Fengshuo Bai
Qizhi Chen
Chengdong Ma
Mingzhi Wang
Haoran Sun
Zilong Zheng
Wenbo Ding
615
19
0
26 Feb 2025
Is Free Self-Alignment Possible?
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
426
2
0
24 Feb 2025
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Hao Sun
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
235
3
0
24 Feb 2025
Mixture of Attentions For Speculative Decoding
Mixture of Attentions For Speculative DecodingInternational Conference on Learning Representations (ICLR), 2024
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
330
12
0
04 Oct 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation SteeringInternational Conference on Learning Representations (ICLR), 2024
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
499
70
0
06 Sep 2024
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
Jianhui Chen
Xiaozhi Wang
Zijun Yao
Yushi Bai
Lei Hou
Juanzi Li
LLMSVKELM
333
26
0
20 Jun 2024
1