Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.09332
Cited By
v1
v2
v3 (latest)
WebGPT: Browser-assisted question-answering with human feedback
17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"WebGPT: Browser-assisted question-answering with human feedback"
50 / 1,123 papers shown
Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hongru Wang
Wenyu Huang
Yufei Wang
Yuanhao Xi
Jianqiao Lu
Huan Zhang
Nan Hu
Zeming Liu
Jeff Z. Pan
Kam-Fai Wong
LLMAG
315
7
0
19 May 2025
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
220
0
0
17 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Oana Frunza
...
Anderson Schneider
Yuriy Nevmyvaka
Yang Katie Zhao
Alfredo García
Mingyi Hong
LRM
349
22
0
17 May 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
IEEE International Conference on Information Reuse and Integration (IRI), 2025
Falong Fan
Xi Li
LLMAG
AAML
336
6
0
16 May 2025
LLM Agents Are Hypersensitive to Nudges
Manuel Cherep
Pattie Maes
Nikhil Singh
292
2
0
16 May 2025
Demystifying AI Agents: The Final Generation of Intelligence
Kevin J McNamara
Rhea Pritham Marpu
188
2
0
15 May 2025
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Hongjin Qian
Zhengyang Liang
RALM
LRM
514
6
0
14 May 2025
Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Ethics: An International Journal of Social, Political, and Legal Philosophy (Ethics), 2025
Shannon Lodoen
Alexi Orchard
243
0
0
14 May 2025
HealthBench: Evaluating Large Language Models Towards Improved Human Health
Rahul Arora
Jason W. Wei
Rebecca Soskin Hicks
Preston Bowman
Joaquin Quiñonero Candela
...
Meghan Shah
Andrea Vallone
Alex Beutel
Johannes Heidecke
K. Singhal
LM&MA
AI4MH
ELM
296
125
0
13 May 2025
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
391
9
0
13 May 2025
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
Xiaolin Huang
Weiwen Liu
Xingshan Zeng
Yanhua Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Yun Wang
Ruiming Tang
Defu Lian
KELM
377
4
0
12 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
Zhong-Zhi Li
X. Wu
Weinong Wang
J. Hu
Yingying Zhang
Wenqiang Zhang
ReLM
LRM
528
37
0
12 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Zhu Han
Choong Seon Hong
AI4CE
401
5
0
11 May 2025
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Zhuocheng Gong
Jian Guan
Wei Wu
Huishuai Zhang
Dongyan Zhao
342
4
0
08 May 2025
Advancing and Benchmarking Personalized Tool Invocation for LLMs
Xiaolin Huang
Yuefeng Huang
Wen Liu
Xingshan Zeng
Yijiao Wang
Ruiming Tang
Hong Xie
Defu Lian
248
2
0
07 May 2025
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering
Joshua Owotogbe
LLMAG
278
6
0
06 May 2025
Soft Best-of-n Sampling for Model Alignment
C. M. Verdun
Alex Oesterling
Himabindu Lakkaraju
Flavio du Pin Calmon
BDL
861
7
0
06 May 2025
RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation
Tiantian Gan
Qiyao Sun
78
15
0
06 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
373
9
0
05 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
581
8
0
05 May 2025
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Christian Schroeder de Witt
AAML
AI4CE
1.1K
35
0
04 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
375
10
0
01 May 2025
Coral Protocol: Open Infrastructure Connecting The Internet of Agents
Roman J. Georgio
Caelum Forder
Suman Deb
Andri Rahimov
Peter Carroll
Önder Gürcan
402
0
0
30 Apr 2025
A Domain-Agnostic Scalable AI Safety Ensuring Framework
Beomjun Kim
Kangyeon Kim
Sunwoo Kim
Yeonsang Shin
Heejin Ahn
630
0
0
29 Apr 2025
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
Peilin Zhou
Bruce Leon
Xiang Ying
Chen Zhang
Yifan Shao
...
Sixin Hong
J. Ren
Jian Chen
Chao-Hong Liu
Yining Hua
RALM
ELM
LRM
356
54
0
27 Apr 2025
AndroidGen: Building an Android Language Agent under Data Scarcity
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Zifei Shan
Shanghang Zhang
Yuxiao Dong
Jie Tang
LLMAG
320
5
0
27 Apr 2025
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
Shaokun Zhang
Yi Dong
Jieyu Zhang
Jan Kautz
Bryan Catanzaro
Andrew Tao
Qingyun Wu
Zhiding Yu
Guilin Liu
LLMAG
OffRL
KELM
LRM
511
0
0
25 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
1.3K
117
0
22 Apr 2025
Establishing Reliability Metrics for Reward Models in Large Language Models
Yizhou Chen
Yawen Liu
Xuesi Wang
Qingtao Yu
Guangda Huzhang
Anxiang Zeng
Han Yu
Zhiming Zhou
287
1
0
21 Apr 2025
a1: Steep Test-time Scaling Law via Environment Augmented Generation
Shansong Liu
Shenghua Liu
Yiwei Wang
Baolong Bi
Yuyao Ge
Jun Wan
Yurong Wu
Xueqi Cheng
LRM
295
11
0
20 Apr 2025
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
International Conference on Learning Representations (ICLR), 2025
João Loula
Benjamin LeBrun
Li Du
Ben Lipkin
Clemente Pasti
...
Ryan Cotterel
Vikash K. Mansinghka
Alexander K. Lew
Tim Vieira
Timothy J. O'Donnell
544
22
0
17 Apr 2025
Memorization vs. Reasoning: Updating LLMs with New Knowledge
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Aochong Oliver Li
Tanya Goyal
KELM
351
9
0
16 Apr 2025
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
Jason W. Wei
Zhiqing Sun
Spencer Papay
S. McKinney
Jeffrey Han
Isa Fulford
Hyung Won Chung
Alex Tachard Passos
W. Fedus
Amelia Glaese
274
206
0
16 Apr 2025
Offline Learning and Forgetting for Reasoning with Large Language Models
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
1.2K
1
0
15 Apr 2025
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Shuai Zhao
Linchao Zhu
Yi Yang
Yi Yang
456
4
0
14 Apr 2025
DeepTrans: Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
459
1
0
14 Apr 2025
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù
Amirhossein Kazemnejad
Nicholas Meade
Arkil Patel
Dongchan Shin
Alejandra Zambrano
Karolina Stañczak
Peter Shaw
Christopher Pal
Siva Reddy
LLMAG
360
17
0
11 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAG
ELM
373
1
0
10 Apr 2025
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Pedro Ferreira
Wilker Aziz
Ivan Titov
LRM
359
4
0
07 Apr 2025
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling
Benjamin Lipkin
Benjamin LeBrun
Jacob Hoover Vigly
João Loula
David R. MacIver
...
Robert Bamler
Vikash K. Mansinghka
Timothy J. O'Donnell
Alexander K. Lew
Tim Vieira
373
6
0
07 Apr 2025
Building LLM Agents by Incorporating Insights from Computer Systems
Yapeng Mi
Zhi Gao
Xiaojian Ma
Qing Li
LLMAG
363
1
0
06 Apr 2025
The Use of Gaze-Derived Confidence of Inferred Operator Intent in Adjusting Safety-Conscious Haptic Assistance
Jeremy D. Webb
Michael Bowman
Songpo Li
Xiaoli Zhang
322
0
0
04 Apr 2025
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows
Souradip Chakraborty
Mohammadreza Pourreza
Ruoxi Sun
Yiwen Song
Nino Scherrer
...
Furong Huang
Amrit Singh Bedi
Ahmad Beirami
Hamid Palangi
Tomas Pfister
546
2
0
02 Apr 2025
HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents
Shiyi Liu
Haiying Shen
Shuai Che
Mahdi Ghandi
Mingqin Li
LLMAG
361
4
0
01 Apr 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
564
23
0
27 Mar 2025
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
International Conference on Learning Representations (ICLR), 2025
Souradip Chakraborty
Sujay Bhatt
Udari Madhushani Sehwag
Soumya Suvra Ghosal
Jiahao Qiu
Mengdi Wang
Dinesh Manocha
Furong Huang
Alec Koppel
Sumitra Ganesh
368
16
0
27 Mar 2025
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
Yujiao Shi
Mengchen Zhang
Tong Wu
Tengfei Wang
Gordon Wetzstein
Dahua Lin
Yu Qiao
ELM
600
6
0
27 Mar 2025
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
283
4
0
27 Mar 2025
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search
Yunhai Hu
Yilun Zhao
Chen Zhao
Arman Cohan
ReLM
LRM
352
3
0
26 Mar 2025
OmniNova:A General Multimodal Agent Framework
Pengfei Du
LLMAG
213
0
0
25 Mar 2025
Previous
1
2
3
4
5
6
...
21
22
23
Next