ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.02197
  4. Cited By
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
v1v2v3 (latest)

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

3 October 2024
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
ArXiv (abs)PDFHTMLHuggingFace (9 upvotes)

Papers citing "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment"

13 / 13 papers shown
Title
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Chenheng Zhang
Tianqi Du
Jizhe Zhang
Mingqing Xiao
Yifei Wang
Yisen Wang
Zhouchen Lin
ALM
65
0
0
23 Oct 2025
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
Leitian Tao
Xuefeng Du
Shouqing Yang
SyDa
124
0
0
30 Sep 2025
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou
Ruiyi Zhang
Huaisheng Zhu
Branislav Kveton
Jiuxiang Gu
J. Gu
Jian Chen
Changyou Chen
MLLMVLMLRM
219
3
0
28 Jul 2025
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
164
1
0
01 Jun 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
406
11
0
19 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
202
0
0
16 May 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
228
25
0
12 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
241
4
0
03 Apr 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
241
16
0
04 Mar 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Rethinking Diverse Human Preference Learning through Principal Component AnalysisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
231
4
0
18 Feb 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
LiPO: Listwise Preference Optimization through Learning-to-RankNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
485
75
0
28 Jan 2025
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Minlie Huang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
448
40
0
05 Dec 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
302
551
0
06 Apr 2024
1