ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.09332
  4. Cited By
WebGPT: Browser-assisted question-answering with human feedback
v1v2v3 (latest)

WebGPT: Browser-assisted question-answering with human feedback

17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
    ALMRALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "WebGPT: Browser-assisted question-answering with human feedback"

50 / 1,123 papers shown
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRLLRM
438
0
0
24 Dec 2025
Learning to Orchestrate Agents in Natural Language with the Conductor
Learning to Orchestrate Agents in Natural Language with the Conductor
Stefan Nielsen
Edoardo Cetin
Peter Schwendeman
Qi Sun
Jinglue Xu
Yujin Tang
LLMAG
107
1
0
04 Dec 2025
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference
Yue Yu
Qiwei Di
Quanquan Gu
Dongruo Zhou
BDL
183
0
0
04 Dec 2025
Process-Centric Analysis of Agentic Software Systems
Process-Centric Analysis of Agentic Software Systems
Shuyang Liu
Yang Chen
Rahul Krishna
Saurabh Sinha
Jatin Ganhotra
Reyhan Jabbarvand
61
0
0
02 Dec 2025
Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking
Lingling Fu
MoMe
128
0
0
30 Nov 2025
Evolving Paradigms in Task-Based Search and Learning: A Comparative Analysis of Traditional Search Engine with LLM-Enhanced Conversational Search System
Zhitong Guan
Yi Wang
29
0
0
29 Nov 2025
An Empirical Study on the Security Vulnerabilities of GPTs
Tong Wu
Weibin Wu
Zibin Zheng
LLMAGELM
153
0
0
28 Nov 2025
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Hongjin Su
Shizhe Diao
Ximing Lu
Mingjie Liu
Jiacheng Xu
...
Evelina Bakhturina
Tao Yu
Yejin Choi
Jan Kautz
Pavlo Molchanov
262
5
0
26 Nov 2025
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
Yuyi Li
Daoyuan Chen
Zhen Wang
Yutong Lu
Yaliang Li
145
0
0
25 Nov 2025
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
Chenliang Li
Adel Elmahdy
Alex Boyd
Zhongruo Wang
Alfredo García
Parminder Bhatia
Taha A. Kass-Hout
Cao Xiao
Mingyi Hong
OffRL
184
0
0
25 Nov 2025
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
X. Hou
Shaoyuan Xu
Manan Biyani
Mayan Li
Jia-Wei Liu
Todd C. Hollon
Bryan Wang
140
0
0
24 Nov 2025
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
Jianghao Wu
Yasmeen George
Jin Ye
Y. Wu
Daniel F. Schmidt
Jianfei Cai
LRM
106
0
0
22 Nov 2025
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
Yicong Zheng
Kevin L. McKee
Thomas Miconi
Zacharie Bugaud
Mick van Gelderen
Jed McCaleb
RALM
69
1
0
20 Nov 2025
Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework
Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework
Nguyen-Khang Le
Nguyen Hiep
Minh Nguyen
Son T. Luu
Trung Vo
Quan Minh Bui
Nomura Shoshin
L. Nguyen
201
0
0
19 Nov 2025
It's LIT! Reliability-Optimized LLMs with Inspectable Tools
It's LIT! Reliability-Optimized LLMs with Inspectable Tools
Ruixin Zhang
J. Donnelly
Zhicheng Guo
Ghazal Khalighinejad
Haiyang Huang
A. Barnett
Cynthia Rudin
105
0
0
18 Nov 2025
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance
Genglin Liu
Shijie Geng
Sha Li
Hejie Cui
Sarah Zhang
Xin Liu
Tianyi Liu
CLL
621
1
0
17 Nov 2025
From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
From Experience to Strategy: Empowering LLM Agents with Trainable Graph MemoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Siyu Xia
Zekun Xu
Jiajun Chai
Wentian Fan
Yan Song
Xiaohan Wang
G. Yin
Wei Lin
Haifeng Zhang
Jun Wang
LLMAG
477
1
0
11 Nov 2025
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
Guoxin Chen
Zile Qiao
Xuanzhong Chen
Donglei Yu
Haotian Xu
...
Minpeng Liao
Yong Jiang
Pengjun Xie
Fei Huang
Jingren Zhou
337
3
0
10 Nov 2025
Inference-Time Personalized Alignment with a Few User Preference Queries
Inference-Time Personalized Alignment with a Few User Preference Queries
Victor-Alexandru Pădurean
Parameswaran Kamalaruban
Nachiket Kotalwar
Alkis Gotovos
Adish Singla
171
0
0
04 Nov 2025
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
Xu Liu
Yan Chen
Kan Ling
Yichi Zhu
Hengrun Zhang
Guisheng Fan
Huiqun Yu
AAML
116
1
0
04 Nov 2025
Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch
Tool Zero: Training Tool-Augmented LLMs via Pure RL from ScratchConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Yirong Zeng
Xiao Ding
Yutai Hou
Yuxian Wang
Li Du
...
Duyu Tang
Dandan Tu
Weiwen Liu
Bing Qin
Ting Liu
OffRLSyDa
294
0
0
02 Nov 2025
A CPU-Centric Perspective on Agentic AI
A CPU-Centric Perspective on Agentic AI
Ritik Raj
Hong Wang
Tushar Krishna
296
0
0
01 Nov 2025
DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion
DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion
Ruofan Liu
Yun Lin
Zhiyong Huang
Jin Song Dong
AAMLSILM
377
0
0
01 Nov 2025
ToolRM: Towards Agentic Tool-Use Reward Modeling
ToolRM: Towards Agentic Tool-Use Reward Modeling
Renhao Li
Jianhong Tu
Yang Su
Hamid Alinejad-Rokny
Derek F. Wong
Junyang Lin
Min Yang
Junyang Lin
Min Yang
LRM
161
1
0
30 Oct 2025
Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models
Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models
Sriram Balasubramaniam
S. Basu
Koustava Goswami
Ryan Rossi
Varun Manjunatha
Roshan Santhosh
Ruiyi Zhang
Soheil Feizi
Nedim Lipka
LRMReLM
366
0
0
29 Oct 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Yueqi Song
Ketan Ramaneti
Zaid A. W. Sheikh
Z. Chen
Boyu Gou
...
Xiang Yue
Tao Yu
Huan Sun
Yu-Chuan Su
Graham Neubig
173
2
0
28 Oct 2025
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
Yizhang Zhu
Liangwei Wang
Chenyu Yang
Xiaotian Lin
Boyan Li
...
Shaolei Zhang
Y. Zhang
Xuanhe Zhou
Guoliang Li
Yuyu Luo
AI4TS
188
2
0
27 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
115
0
0
27 Oct 2025
Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models
Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models
Mohammad Atif Quamar
Mohammad Areeb
Nishant Sharma
Ananth Shreekumar
Jonathan Rosenthal
Muslum Ozgur Ozmen
Mikhail Kuznetsov
Z. Berkay Celik
88
0
0
27 Oct 2025
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
S. Zhao
Aidan Li
Rob Brekelmans
Roger C. Grosse
86
0
0
24 Oct 2025
PanicToCalm: A Proactive Counseling Agent for Panic Attacks
PanicToCalm: A Proactive Counseling Agent for Panic Attacks
Jihyun Lee
Yejin Min
San Kim
Yejin Jeon
SungJun Yang
Hyounghun Kim
Gary Lee
167
0
0
24 Oct 2025
Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models
Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models
Hoang Phan
Xianjun Yang
Kevin Yao
Jingyu Zhang
Shengjie Bi
Xiaocheng Tang
Madian Khabsa
Lijuan Liu
Deren Lei
OffRLCLLKELMVLMLRM
135
0
0
24 Oct 2025
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
M. Andreux
Märt Bakler
Yanael Barbier
Hamza Ben Chekroun
Emilien Biré
...
Ivan Valentini
Tony Wu
Laura Yie
Kai Yuan
Jevgenij Zubovskij
LLMAGLRM
155
0
0
22 Oct 2025
Crucible: Quantifying the Potential of Control Algorithms through LLM Agents
Crucible: Quantifying the Potential of Control Algorithms through LLM Agents
Lianchen Jia
Chaoyang Li
Qian Houde
Tianchi Huang
Jiangchuan Liu
Lifeng Sun
112
0
0
21 Oct 2025
CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent
CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent
Haojia Lin
Xiaoyu Tan
Yulei Qin
Zihan Xu
Yuchen Shi
...
Shaofei Cai
Siqi Cai
Chaoyou Fu
Ke Li
Xing Sun
ALM
168
1
0
21 Oct 2025
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Dayan Pan
Zhaoyang Fu
Jingyuan Wang
Xiao Han
Yue Zhu
Xiangyu Zhao
KELMCLL
127
0
0
20 Oct 2025
Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents
Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents
Yihong Tang
Kehai Chen
Liang Yue
Jinxin Fan
Caishen Zhou
...
Kaiyang Guo
Xingshan Zeng
Wenjing Cun
L. Shang
Min Zhang
LLMAG
158
0
0
20 Oct 2025
WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale
WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale
Yuxuan Lu
Jing Huang
Hui Liu
Jiri Gesi
Yan Han
Shihan Fu
Tianqi Zheng
Dakuo Wang
OffRL
91
1
0
17 Oct 2025
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
Baode Wang
Biao Wu
Weizhen Li
Meng Fang
Zuming Huang
...
Haozhe Wang
Zuming Huang
Ling Chen
Wei Chu
Yuan Qi
208
6
0
17 Oct 2025
Natural Language Tools: A Natural Language Approach to Tool Calling In Large Language Agents
Natural Language Tools: A Natural Language Approach to Tool Calling In Large Language Agents
Reid T. Johnson
Michelle D. Pain
Jordan D. West
LLMAGLM&MAELM
207
0
0
16 Oct 2025
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Yuchun Miao
Liang Ding
Sen Zhang
Rong Bao
L. Zhang
Dacheng Tao
187
0
0
15 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLMLRM
226
0
0
15 Oct 2025
Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation
Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation
Jiamin Chen
Yuchen Li
Xinyu Ma
X. Chen
Xiaokun Zhang
Shuaiqiang Wang
Chen Ma
D. Yin
RALMLRM
196
0
0
15 Oct 2025
On the Role of Preference Variance in Preference Optimization
On the Role of Preference Variance in Preference Optimization
Jiacheng Guo
Zihao Li
Jiahao Qiu
Yue Wu
Mengdi Wang
159
2
0
14 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
250
5
0
13 Oct 2025
Attacks by Content: Automated Fact-checking is an AI Security Issue
Attacks by Content: Automated Fact-checking is an AI Security Issue
Michael Schlichtkrull
AAML
116
0
0
13 Oct 2025
Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Tuan Nguyen
Long Tran-Thanh
LLMAG
123
0
0
10 Oct 2025
Fundamentals of Building Autonomous LLM Agents
Fundamentals of Building Autonomous LLM Agents
Victor de Lamo Castrillo
Habtom Kahsay Gidey
Alexander Lenz
Alois Knoll
LLMAGLM&Ro
207
3
0
10 Oct 2025
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
Tajamul Ashraf
Umair Nawaz
Abdelrahman M. Shaker
Rao Muhammad Anwer
Philip Torr
Fahad Shahbaz Khan
Salman Khan
227
0
0
09 Oct 2025
Memory Retrieval and Consolidation in Large Language Models through Function Tokens
Memory Retrieval and Consolidation in Large Language Models through Function Tokens
Shaohua Zhang
Yuan Lin
Hang Li
LLMAG
96
0
0
09 Oct 2025
1234...212223
Next