Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

29 February 2024

Huimin Chen

Jie Zhou

Yankai Lin

Zhiyuan Liu

Maosong Sun

ArXiv (abs)PDF HTML

Papers citing "Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment"

50 / 59 papers shown

Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts

388

25 Nov 2025

Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network

214

10 Nov 2025

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs

162

05 Oct 2025

Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards

Yiran Shen

Yu Xia

Jonathan D. Chang

Prithviraj Ammanabrolu

179

01 Oct 2025

OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment

...

276

29 Sep 2025

Preemptive Detection and Steering of LLM Misalignment via Latent Reachability

Sathwik Karnik

Somil Bansal

LLMSV

175

25 Sep 2025

Towards Universal Debiasing for Language Models-based Tabular Data Generation

165

20 Sep 2025

The Alignment Bottleneck

Wenjun Cao

315

19 Sep 2025

Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting

152

14 Sep 2025

Murphys Laws of AI Alignment: Why the Gap Always Wins

Madhava Gaikwad

ALM

308

04 Sep 2025

PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization

282

22 Jul 2025

ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning

248

14 Jul 2025

Large Language Models Often Know When They Are Being Evaluated

454

28 May 2025

MOSLIM:Align with diverse preferences in prompts through reward classification

Yu Zhang

Wanli Jiang

Zhengyu Yang

213

24 May 2025

Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

366

24 May 2025

Is Active Persona Inference Necessary for Aligning Small Models to Personal Preferences?

437

19 May 2025

Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling

456

18 May 2025

References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation

616

10 May 2025

PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

411

06 May 2025

Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors

422

27 Apr 2025

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data

415

23 Apr 2025

Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

410

17 Apr 2025

REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective

275

15 Apr 2025

A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models

Carlos Peláez-González

Andrés Herrera-Poyatos

Cristina Zuheros

David Herrera-Poyatos

Virilo Tejedor

F. Herrera

AAML

283

07 Apr 2025

ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback

262

27 Mar 2025

Controlling Large Language Model with Latent Actions

361

27 Mar 2025

A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World ApplicationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

836

21 Mar 2025

Language Model Personalization via Reward Factorization

407

08 Mar 2025

Uncovering Gaps in How Humans and LLMs Interpret Subjective LanguageInternational Conference on Learning Representations (ICLR), 2025

Erik Jones

Arjun Patrawala

Jacob Steinhardt

278

06 Mar 2025

Robust Multi-Objective Preference Alignment with Online DPOAAAI Conference on Artificial Intelligence (AAAI), 2025

268

01 Mar 2025

STAIR: Improving Safety Alignment with Introspective Reasoning

444

04 Feb 2025

Learning to Summarize from LLM-generated FeedbackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

386

28 Jan 2025

Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond

511

19 Jan 2025

REFA: Reference Free Alignment for multi-preference optimization

560

20 Dec 2024

Reinforcement Learning Enhanced LLMs: A Survey

859

05 Dec 2024

Comparison-based Active Preference Learning for Multi-dimensional PersonalizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Minhyeon Oh

Seungjoon Lee

Jungseul Ok

353

01 Nov 2024

L3Ms -- Lagrange Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

1.1K

28 Oct 2024

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Yancheng He

Bo Zheng

252

25 Oct 2024

Inference time LLM alignment in single and multidomain preference spectrum

Siyang Song

Zheng Qi

Nikolaos Pappas

Srikanth Doss Kadarundalagi Raghuram Doss

194

24 Oct 2024

SudoLM: Learning Access Control of Parametric Knowledge with Authorization AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

1.0K

18 Oct 2024

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety RequirementsInternational Conference on Learning Representations (ICLR), 2024

1.1K

11 Oct 2024

COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning FrameworkConference on Uncertainty in Artificial Intelligence (UAI), 2024

359

10 Oct 2024

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time AlignmentInternational Conference on Learning Representations (ICLR), 2024

Yuancheng Xu

Udari Madhushani Sehwag

813

10 Oct 2024

Towards a Unified View of Preference Learning for Large Language Models: A Survey

...

Houfeng Wang

Zhifang Sui

Peiyi Wang

Baobao Chang

519

04 Sep 2024

Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive PromptsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Tingchen Fu

Yupeng Hou

Julian McAuley

Rui Yan

354

09 Aug 2024

Know Your Limits: A Survey of Abstention in Large Language Models

568

25 Jul 2024

BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

Sangmook Kim

295

30 Jun 2024

Decoding-Time Language Model Alignment with Multiple Objectives

Hannaneh Hajishirzi

Simon Du

422

27 Jun 2024

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

...

Leyang Cui

259

24 Jun 2024

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Shiqi Shen

Zhi Gong

Yankai Lin

Ji-Rong Wen

425

17 Jun 2024