ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01470
  4. Cited By
DogeRM: Equipping Reward Models with Domain Knowledge through Model
  Merging

DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging

1 July 2024
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Yun-Nung Chen
    VLM
    ALM
ArXivPDFHTML

Papers citing "DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging"

7 / 7 papers shown
Title
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging
Zhixiang Wang
Zhenyu Mao
Yixuan Qiao
Yunfang Wu
Biye Li
MoMe
70
0
0
17 Feb 2025
RewardBench: Evaluating Reward Models for Language Modeling
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James Validad Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
59
44
0
20 Mar 2024
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
92
92
0
22 Jan 2024
Magicoder: Empowering Code Generation with OSS-Instruct
Magicoder: Empowering Code Generation with OSS-Instruct
Yuxiang Wei
Zhe Wang
Jiawei Liu
Yifeng Ding
Lingming Zhang
SyDa
26
17
0
04 Dec 2023
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
213
204
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1