ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.09656
  4. Cited By
Learn Your Reference Model for Real Good Alignment
v1v2v3v4 (latest)

Learn Your Reference Model for Real Good Alignment

15 April 2024
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Gleb Gerasimov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
    OffRL
ArXiv (abs)PDFHTMLHuggingFace (88 upvotes)Github (29295★)

Papers citing "Learn Your Reference Model for Real Good Alignment"

50 / 89 papers shown
Humanline: Online Alignment as Perceptual Loss
Humanline: Online Alignment as Perceptual Loss
Sijia Liu
Niklas Muennighoff
Kawin Ethayarajh
OnRL
131
0
0
30 Mar 2026
Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
Z. Pan
Zhikang Chen
Ding Li
Min Zhang
Sen Cui
...
Yi Yang
Deheng Ye
Yu Zhang
T. Zhu
Tianling Ren
MoMeCLLFedML
378
0
0
24 Nov 2025
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
Ruibo Deng
Duanyu Feng
Wenqiang Lei
240
0
0
12 Nov 2025
Learning to Reason Efficiently with Discounted Reinforcement Learning
Learning to Reason Efficiently with Discounted Reinforcement Learning
Alex Ayoub
Kavosh Asadi
Dale Schuurmans
Csaba Szepesvári
Karim Bouyarmane
OffRLLRM
182
1
0
27 Oct 2025
Reinforced Preference Optimization for Recommendation
Reinforced Preference Optimization for Recommendation
Junfei Tan
Yuxin Chen
An Zhang
Junguang Jiang
Yinan Han
Ziru Xu
Han Zhu
Jian Xu
Bo Zheng
Xiang-Bin Wang
OffRL
254
1
0
14 Oct 2025
Hybrid Explanation-Guided Learning for Transformer-Based Chest X-Ray Diagnosis
Hybrid Explanation-Guided Learning for Transformer-Based Chest X-Ray Diagnosis
Shelley Zixin Shu
Haozhe Luo
Alexander Poellinger
Mauricio Reyes
ViTMedIm
191
0
0
14 Oct 2025
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory
Jiancong Xiao
Zhekun Shi
Kaizhao Liu
Q. Long
Weijie J. Su
264
5
0
14 Jun 2025
Reinforcement Learning Teachers of Test Time Scaling
Edoardo Cetin
Tianyu Zhao
Yujin Tang
OffRLReLMLRM
466
6
0
10 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
210
3
0
09 Jun 2025
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
360
11
0
01 Jun 2025
Rethinking Direct Preference Optimization in Diffusion Models
Rethinking Direct Preference Optimization in Diffusion Models
Junyong Kang
Seohyun Lim
Kyungjune Baek
Hyunjung Shim
1.1K
0
0
24 May 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu
Siya Qi
Xinyu Wang
Chen Qian
Yali Du
Petr Slovak
OffRLLRM
326
4
0
21 May 2025
On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging
On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical ImagingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Haozhe Luo
Ziyu Zhou
Zixin Shu
Aurélie Pahud de Mortanges
Robert Berke
Mauricio Reyes
273
2
0
15 May 2025
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao
Bojian Hou
Zhanliang Wang
Ruochen Jin
Q. Long
Weijie Su
Li Shen
579
20
0
04 May 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
434
7
0
22 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
722
13
0
03 Apr 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem ProvingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sara Rajaee
Kumar Pratik
Gabriele Cesa
Arash Behboodi
OffRLLRM
438
2
0
12 Mar 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
LiPO: Listwise Preference Optimization through Learning-to-RankNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
711
93
0
28 Jan 2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li
Xuyang Hu
Xiaoye Qu
Linjie Li
Yu Cheng
439
41
0
22 Jan 2025
How to Merge Your Multimodal Models Over Time?
How to Merge Your Multimodal Models Over Time?Computer Vision and Pattern Recognition (CVPR), 2024
Sebastian Dziadzio
Vishaal Udandarao
Karsten Roth
Christian Schroeder de Witt
Zeynep Akata
Samuel Albanie
Matthias Bethge
MoMe
530
18
0
09 Dec 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
763
226
1
15 Nov 2024
Towards Improved Preference Optimization Pipeline: from Data Generation
  to Budget-Controlled Regularization
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization
Zhuotong Chen
Fang Liu
Jennifer Zhu
Wanyu Du
Yanjun Qi
341
3
0
07 Nov 2024
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
RainbowPO: A Unified Framework for Combining Improvements in Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024
Hanyang Zhao
Genta Indra Winata
Anirban Das
Shi-Xiong Zhang
D. Yao
Wenpin Tang
Sambit Sahu
446
19
0
05 Oct 2024
Evaluation of Large Language Models for Summarization Tasks in the
  Medical Domain: A Narrative Review
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILMELMLM&MA
367
4
0
26 Sep 2024
Modulated Intervention Preference Optimization (MIPO): Keep the Easy,
  Refine the Difficult
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
Cheolhun Jang
375
0
0
26 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
532
19
0
04 Sep 2024
Understanding Reference Policies in Direct Preference Optimization
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu
Pengfei Liu
Arman Cohan
406
20
0
18 Jul 2024
New Desiderata for Direct Preference Optimization
New Desiderata for Direct Preference Optimization
Xiangkun Hu
Tong He
David Wipf
239
5
0
12 Jul 2024
LIONs: An Empirically Optimized Approach to Align Language Models
LIONs: An Empirically Optimized Approach to Align Language Models
Xiao Yu
Qingyang Wu
Yu Li
Zhou Yu
ALM
286
7
0
09 Jul 2024
Aligning Diffusion Models with Noise-Conditioned Perception
Aligning Diffusion Models with Noise-Conditioned Perception
Alexander Gambashidze
Anton Kulikov
Yuriy Sosnin
Ilya Makarov
375
12
0
25 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
398
39
0
24 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
343
27
0
16 Jun 2024
Online Joint Fine-tuning of Multi-Agent Flows
Online Joint Fine-tuning of Multi-Agent Flows
Paul Mineiro
418
2
0
06 Jun 2024
Scaling Laws for Reward Model Overoptimization in Direct Alignment
  Algorithms
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov
Yaswanth Chittepu
Ryan Park
Harshit S. Sikchi
Joey Hejna
Bradley Knox
Chelsea Finn
S. Niekum
405
114
0
05 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
238
78
0
03 Jun 2024
Robust Preference Optimization through Reward Model Distillation
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
528
65
0
29 May 2024
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
Jiancong Xiao
Ziniu Li
Xingyu Xie
E. Getzen
Cong Fang
Qi Long
Weijie J. Su
426
64
0
26 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free RewardNeural Information Processing Systems (NeurIPS), 2024
Yu Meng
Mengzhou Xia
Danqi Chen
700
913
0
23 May 2024
LIRE: listwise reward enhancement for preference alignment
LIRE: listwise reward enhancement for preference alignment
Mingye Zhu
Yi Liu
Lei Zhang
Junbo Guo
Zhendong Mao
249
9
0
22 May 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
540
713
0
06 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
573
171
0
04 Apr 2024
Disentangling Length from Quality in Direct Preference Optimization
Disentangling Length from Quality in Direct Preference Optimization
Ryan Park
Rafael Rafailov
Stefano Ermon
Chelsea Finn
ALM
285
193
0
28 Mar 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Tao Gui
Xuanjing Huang
269
60
0
18 Mar 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
566
231
0
20 Feb 2024
Generalized Preference Optimization: A Unified Approach to Offline
  Alignment
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang
Z. Guo
Zeyu Zheng
Daniele Calandriello
Rémi Munos
Mark Rowland
Pierre Harvey Richemond
Michal Valko
Bernardo Avila-Pires
Bilal Piot
364
152
0
08 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
1.2K
933
0
02 Feb 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Bing Wang
Rui Zheng
Luyao Chen
Yan Liu
Jiajun Sun
...
Tao Gui
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yuanyuan Jiang
ALM
426
155
0
11 Jan 2024
Nash Learning from Human Feedback
Nash Learning from Human FeedbackInternational Conference on Machine Learning (ICML), 2023
Rémi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
...
Nikola Momchev
Olivier Bachem
D. Mankowitz
Doina Precup
Bilal Piot
753
206
0
01 Dec 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
589
2,282
0
20 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can
  Fool Large Language Models Easily
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models EasilyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
538
256
0
14 Nov 2023
12
Next
Page 1 of 2