Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.18451
Cited By
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
24 October 2024
Chris Liu
Liang Zeng
J. Liu
Rui Yan
Jujie He
Chaojie Wang
Shuicheng Yan
Yang Liu
Yahui Zhou
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs"
15 / 15 papers shown
Title
RM-R1: Reward Modeling as Reasoning
X. Chen
Gaotang Li
Z. Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
42
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
60
0
0
05 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
J. Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
30
0
0
05 May 2025
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
36
0
0
17 Apr 2025
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Z. Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
27
0
0
08 Apr 2025
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking
Chris Samarinas
Hamed Zamani
ALM
LRM
61
0
0
04 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
37
0
0
01 Apr 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
45
1
0
22 Feb 2025
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
Xinyi Yang
Liang Zeng
Heng Dong
C. Yu
X. Wu
H. Yang
Yu Wang
Milind Tambe
Tonghan Wang
66
2
0
18 Feb 2025
ARIES: Stimulating Self-Refinement of Large Language Models by Iterative Preference Optimization
Yongcheng Zeng
Xinyu Cui
Xuanfa Jin
Guoqing Liu
Zexu Sun
...
Dong Li
Ning Yang
Jianye Hao
H. Zhang
J. Wang
LRM
LLMAG
74
1
0
08 Feb 2025
Learning to Generate Unit Tests for Automated Debugging
Archiki Prasad
Elias Stengel-Eskin
Justin Chih-Yao Chen
Zaid Khan
Mohit Bansal
ELM
74
1
0
03 Feb 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
50
0
0
31 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
J. Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
108
6
0
05 Dec 2024
Interpreting Language Reward Models via Contrastive Explanations
Junqi Jiang
Tom Bewley
Saumitra Mishra
Freddy Lecue
Manuela Veloso
74
0
0
25 Nov 2024
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
41
8
0
02 Jun 2024
1