ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.12187
  4. Cited By
WARM: On the Benefits of Weight Averaged Reward Models

WARM: On the Benefits of Weight Averaged Reward Models

22 January 2024
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
ArXivPDFHTML

Papers citing "WARM: On the Benefits of Weight Averaged Reward Models"

27 / 27 papers shown
Title
A Model Zoo on Phase Transitions in Neural Networks
A Model Zoo on Phase Transitions in Neural Networks
Konstantin Schurholt
Léo Meynent
Yefan Zhou
Haiquan Lu
Yaoqing Yang
Damian Borth
56
0
0
25 Apr 2025
Energy-Based Reward Models for Robust Language Model Alignment
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
26
0
0
17 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Z. Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
22
0
0
08 Apr 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
29
1
0
08 Jan 2025
CREAM: Consistency Regularized Self-Rewarding Language Models
CREAM: Consistency Regularized Self-Rewarding Language Models
Z. Wang
Weilei He
Zhiyuan Liang
Xuchao Zhang
Chetan Bansal
Ying Wei
Weitong Zhang
Huaxiu Yao
ALM
87
7
0
16 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
41
1
0
09 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua-Hong Wu
16
1
0
03 Oct 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
30
21
0
01 Oct 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
37
11
0
20 Sep 2024
Pareto-Optimal Learning from Preferences with Hidden Context
Pareto-Optimal Learning from Preferences with Hidden Context
Ryan Boldi
Li Ding
Lee Spector
S. Niekum
20
6
0
21 Jun 2024
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li
T. Zhang
Zhiqi Bu
Suyuchen Wang
Huan He
Jie Fu
Yonghui Wu
Jiang Bian
Yong Chen
Yoshua Bengio
FedML
MoMe
87
3
0
11 Jun 2024
Robust Preference Optimization through Reward Model Distillation
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
64
20
0
29 May 2024
Language Model Alignment with Elastic Reset
Language Model Alignment with Elastic Reset
Michael Noukhovitch
Samuel Lavoie
Florian Strub
Aaron Courville
KELM
75
12
0
06 Dec 2023
Specific versus General Principles for Constitutional AI
Specific versus General Principles for Constitutional AI
Sandipan Kundu
Yuntao Bai
Saurav Kadavath
Amanda Askell
Andrew Callahan
...
Zac Hatfield-Dodds
Sören Mindermann
Nicholas Joseph
Sam McCandlish
Jared Kaplan
AILaw
46
24
0
20 Oct 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
205
178
0
20 Oct 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,232
0
22 Mar 2023
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic
  Forgetting in Automatic Speech Recognition
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
MoMe
46
11
0
27 Oct 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
239
208
0
11 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
209
327
0
23 Aug 2022
Linear Connectivity Reveals Generalization Strategies
Linear Connectivity Reveals Generalization Strategies
Jeevesh Juneja
Rachit Bansal
Kyunghyun Cho
João Sedoc
Naomi Saphra
224
45
0
24 May 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
186
91
0
19 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
8,441
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
309
5,177
0
28 Jan 2022
SWAD: Domain Generalization by Seeking Flat Minima
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
207
338
0
17 Feb 2021
A Closer Look at Accuracy vs. Robustness
A Closer Look at Accuracy vs. Robustness
Yao-Yuan Yang
Cyrus Rashtchian
Hongyang R. Zhang
Ruslan Salakhutdinov
Kamalika Chaudhuri
OOD
50
26
0
05 Mar 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,151
0
18 Sep 2019
Simple and Scalable Predictive Uncertainty Estimation using Deep
  Ensembles
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
268
4,940
0
05 Dec 2016
1