Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01045
Cited By
Tool-Augmented Reward Modeling
2 October 2023
Lei Li
Yekun Chai
Shuohuan Wang
Yu Sun
Hao Tian
Ningyu Zhang
Hua-Hong Wu
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tool-Augmented Reward Modeling"
18 / 18 papers shown
Title
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
31
0
0
05 May 2025
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Pedro Ferreira
Wilker Aziz
Ivan Titov
LRM
26
0
0
07 Apr 2025
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Minki Kang
Jongwon Jeong
Jaewoong Cho
ALM
LRM
41
2
0
07 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
44
9
0
03 Apr 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
55
0
0
31 Dec 2024
Building Multi-Agent Copilot towards Autonomous Agricultural Data Management and Analysis
Yu Pan
Jianxin Sun
Hongfeng Yu
Joe Luck
Geng Bai
Nipuna Chamara
Yufeng Ge
Tala Awada
30
0
0
31 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua-Hong Wu
34
1
0
03 Oct 2024
Collaborative Evolving Strategy for Automatic Data-Centric Development
Xu Yang
Haotian Chen
Wenjun Feng
Haoxue Wang
Zeqi Ye
Xinjie Shen
Xiao Yang
Shizhao Sun
Weiqing Liu
Jiang Bian
25
1
0
26 Jul 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
31
77
0
28 May 2024
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
92
92
0
22 Jan 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
S. Hoi
SyDa
ALM
118
232
0
05 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
228
278
0
15 Jul 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
239
489
0
16 Oct 2019
1