Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2402.19085
Cited By
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
29 February 2024
Yiju Guo
Ganqu Cui
Lifan Yuan
Ning Ding
Jiexin Wang
Huimin Chen
Bowen Sun
Ruobing Xie
Jie Zhou
Yankai Lin
Zhiyuan Liu
Maosong Sun
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment"
50 / 59 papers shown
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
Xing Wang
Huiyuan Xie
Y. Wang
Chaojun Xiao
Huimin Chen
Holli Sargeant
Felix Steffek
Jie Shao
Zhiyuan Liu
Maosong Sun
AILaw
ELM
388
0
0
25 Nov 2025
Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network
Yicheng Zhan
Fahim Ahmed
Amy Burrell
Matthew J. Tonkin
Sarah Galambos
Jessica Woodhams
Dalal Alrajeh
214
0
0
10 Nov 2025
Read the Scene, Not the Script: Outcome-Aware Safety for LLMs
Rui Wu
Yihao Quan
Zeru Shi
Zhenting Wang
Yanshu Li
Ruixiang Tang
162
1
0
05 Oct 2025
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Shen
Yu Xia
Jonathan D. Chang
Prithviraj Ammanabrolu
179
3
0
01 Oct 2025
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Guanbin Li
Zhihao Xu
Junhao Dong
Jian Zhao
Yuchen Yuan
...
Zhengtao Yao
Huahui Yi
Dongrui Liu
Xinfeng Li
Kun Wang
276
2
0
29 Sep 2025
Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
Sathwik Karnik
Somil Bansal
LLMSV
175
4
0
25 Sep 2025
Towards Universal Debiasing for Language Models-based Tabular Data Generation
Tianchun Li
Tianci Liu
Xingchen Wang
Rongzhe Wei
P. Li
Lu Su
Jing Gao
165
0
0
20 Sep 2025
The Alignment Bottleneck
Wenjun Cao
315
0
0
19 Sep 2025
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting
Yining Lu
Zilong Wang
Shiyang Li
Xin Liu
Changlong Yu
Qingyu Yin
Zhan Shi
Zixuan Zhang
Meng Jiang
152
8
0
14 Sep 2025
Murphys Laws of AI Alignment: Why the Gap Always Wins
Madhava Gaikwad
ALM
308
1
0
04 Sep 2025
PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization
Han Jiang
Dongyao Zhu
Zhihua Wei
Xiaoyuan Yi
Ziang Xiao
Xing Xie
282
1
0
22 Jul 2025
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
Zhengyue Zhao
Yingzi Ma
S. Jha
Marco Pavone
P. McDaniel
Chaowei Xiao
LRM
248
3
0
14 Jul 2025
Large Language Models Often Know When They Are Being Evaluated
Joe Needham
Giles Edkins
Govind Pimpale
Henning Bartsch
Marius Hobbhahn
LLMAG
ELM
ALM
454
39
0
28 May 2025
MOSLIM:Align with diverse preferences in prompts through reward classification
Yu Zhang
Wanli Jiang
Zhengyu Yang
213
2
0
24 May 2025
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
Min Cheng
Fatemeh Doudi
D. Kalathil
Mohammad Ghavamzadeh
P. R. Kumar
366
3
0
24 May 2025
Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?
Zilu Tang
Afra Feyza Akyürek
Ekin Akyürek
Derry Wijaya
437
0
0
19 May 2025
Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling
Jizhou Guo
Zhaomin Wu
Hanchen Yang
Philip S. Yu
456
0
0
18 May 2025
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
Doyoung Kim
Youngjun Lee
Joeun Kim
Jihwan Bang
Hwanjun Song
Susik Yoon
Jae-Gil Lee
616
0
0
10 May 2025
PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model
Xiaoyuan Zhang
Weisen Jiang
Yuancheng Xu
Hao Chen
Ying-Cong Chen
411
12
0
06 May 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
Shao-Hua Sun
422
2
0
27 Apr 2025
ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Haoran Gu
Handing Wang
Yi Mei
Mengjie Zhang
Yaochu Jin
415
3
0
23 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
410
7
0
17 Apr 2025
REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective
Zhihao Xu
Yongqi Tong
Xin Zhang
Jun Zhou
Xiting Wang
275
2
0
15 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
283
1
0
07 Apr 2025
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback
Taewon Yun
Jihwan Oh
Hyangsuk Min
Yuho Lee
Jihwan Bang
Jason (Jinglun) Cai
Hwanjun Song
OffRL
LRM
262
3
0
27 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
361
5
0
27 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jian Guan
Jian Wu
Jia-Nan Li
Chuanqi Cheng
Wei Wu
LM&MA
836
21
0
21 Mar 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
407
15
0
08 Mar 2025
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
International Conference on Learning Representations (ICLR), 2025
Erik Jones
Arjun Patrawala
Jacob Steinhardt
278
5
0
06 Mar 2025
Robust Multi-Objective Preference Alignment with Online DPO
AAAI Conference on Artificial Intelligence (AAAI), 2025
Raghav Gupta
Ryan Sullivan
Yunxuan Li
Samrat Phatale
Abhinav Rastogi
268
13
0
01 Mar 2025
STAIR: Improving Safety Alignment with Introspective Reasoning
Yuanhang Zhang
Siyuan Zhang
Yao Huang
Zeyu Xia
Zhengwei Fang
Xiao Yang
Ranjie Duan
Dong Yan
Yinpeng Dong
Jun Zhu
LRM
LLMSV
444
50
0
04 Feb 2025
Learning to Summarize from LLM-generated Feedback
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
386
20
0
28 Jan 2025
Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond
Weiyu Chen
Xiaoyuan Zhang
Xiaoyuan Zhang
Xi Lin
Han Zhao
Gang Qu
James T. Kwok
511
23
0
19 Jan 2025
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
560
1
0
20 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Minlie Huang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
859
63
0
05 Dec 2024
Comparison-based Active Preference Learning for Multi-dimensional Personalization
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Minhyeon Oh
Seungjoon Lee
Jungseul Ok
353
1
0
01 Nov 2024
L3Ms -- Lagrange Large Language Models
International Conference on Learning Representations (ICLR), 2024
Guneet S. Dhillon
Xingjian Shi
Yee Whye Teh
Alex Smola
1.1K
2
0
28 Oct 2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Yuchi Xu
Bo Zheng
252
9
0
25 Oct 2024
Inference time LLM alignment in single and multidomain preference spectrum
Siyang Song
Zheng Qi
Nikolaos Pappas
Srikanth Doss Kadarundalagi Raghuram Doss
Monica Sunkara
Kishaloy Halder
Manuel Mager
Yassine Benajiba
194
3
0
24 Oct 2024
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Qin Liu
Haiwei Yang
Chaowei Xiao
Muhao Chen
1.0K
5
0
18 Oct 2024
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
International Conference on Learning Representations (ICLR), 2024
Jingyu Zhang
Ahmed Elgohary
Ahmed Magooda
Daniel Khashabi
Benjamin Van Durme
1.1K
29
0
11 Oct 2024
COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework
Conference on Uncertainty in Artificial Intelligence (UAI), 2024
Yinuo Ren
Tesi Xiao
Michael Shavlovsky
Lexing Ying
Holakou Rahmanian
359
0
0
10 Oct 2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
International Conference on Learning Representations (ICLR), 2024
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
813
39
0
10 Oct 2024
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
519
19
0
04 Sep 2024
Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tingchen Fu
Yupeng Hou
Julian McAuley
Rui Yan
354
7
0
09 Aug 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
568
59
0
25 Jul 2024
BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models
Gihun Lee
Minchan Jeong
Yujin Kim
Hojung Jung
Jaehoon Oh
Sangmook Kim
Se-Young Yun
295
0
0
30 Jun 2024
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi
Yifang Chen
Yushi Hu
Alisa Liu
Hannaneh Hajishirzi
Noah A. Smith
Simon Du
422
80
0
27 Jun 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
259
2
0
24 Jun 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
425
20
0
17 Jun 2024
1
2
Next
Page 1 of 2