v1v2 (latest)

Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

29 May 2025

Papers citing "Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO"

4 / 4 papers shown

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

...

OffRL AI4TS LRM ReLM VLM

1.2K

5,342

22 Jan 2025

Unintentional Unalignment: Likelihood Displacement in Direct Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024

626

11 Oct 2024

A Closer Look at Machine Unlearning for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

792

10 Oct 2024

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

450

601

06 Apr 2024