ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.09332
  4. Cited By
WebGPT: Browser-assisted question-answering with human feedback
v1v2v3 (latest)

WebGPT: Browser-assisted question-answering with human feedback

17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
    ALMRALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "WebGPT: Browser-assisted question-answering with human feedback"

50 / 1,123 papers shown
Prepared mind, fast response: A temporal decoupling framework for adaptive knowledge orchestration in open-domain dialogue
Prepared mind, fast response: A temporal decoupling framework for adaptive knowledge orchestration in open-domain dialogue
Jinling Gan
Churong Liang
Runnan Li
84
0
0
09 Oct 2025
FlowSearch: Advancing deep research with dynamic structured knowledge flow
FlowSearch: Advancing deep research with dynamic structured knowledge flow
Yusong Hu
Runmin Ma
Yue Fan
Jinxin Shi
Zongsheng Cao
...
Lei Bai
Bo Zhang
Wenlong Zhang
Lei Bai
Bo Zhang
AI4CE
154
1
0
09 Oct 2025
CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization
CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization
Debeshee Das
Luca Beurer-Kellner
Marc Fischer
Maximilian Baader
AAML
154
0
0
09 Oct 2025
CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search
CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search
Haoran Ou
Kangjie Chen
Xingshuo Han
Gelei Deng
Jie M. Zhang
Han Qiu
Tianwei Zhang
97
0
0
09 Oct 2025
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
Tajamul Ashraf
Umair Nawaz
Abdelrahman M. Shaker
Rao Muhammad Anwer
Philip Torr
Fahad Shahbaz Khan
Salman Khan
227
0
0
09 Oct 2025
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu
Yuanyang Li
Guhan Chen
Linyue Wang
Hongyang Chen
OffRLLRM
54
1
0
08 Oct 2025
Exposing Citation Vulnerabilities in Generative Engines
Exposing Citation Vulnerabilities in Generative Engines
Riku Mochizuki
Shusuke Komatsu
Souta Noguchi
Kazuto Ataka
ELM
156
0
0
08 Oct 2025
MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning
MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning
Guoxin Chen
Zile Qiao
Wenqing Wang
Donglei Yu
Xuanzhong Chen
...
Yong Jiang
Penguin Xie
Wayne Xin Zhao
Ruihua Song
Fei Huang
LLMAGLRM
144
0
0
06 Oct 2025
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee
Hoyeon Moon
Kevin Zhai
Arun Kumar Chithanar
Anit Kumar Sahu
S. Kar
Chul Lee
Souradip Chakraborty
Amrit Singh Bedi
DiffM
205
0
0
06 Oct 2025
AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning
AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning
Zhanke Zhou
Chentao Cao
Xiao Feng
Xuan Li
Zongze Li
...
Brando Miranda
Tongliang Liu
Sanmi Koyejo
Masashi Sugiyama
Bo Han
ReLMLRM
117
0
0
05 Oct 2025
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation
Hadi Nekoei
Aman Jaiswal
Patrice Béchard
Oleh Shliazhko
Orlando Marquez Ayala
Mathieu Reymond
Massimo Caccia
Alexandre Drouin
Sarath Chandar
Alexandre Lacoste
KELM
129
1
0
05 Oct 2025
Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling
Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling
Hyung Gyu Rho
Sian Lee
151
0
0
05 Oct 2025
AgenticRAG: Tool-Augmented Foundation Models for Zero-Shot Explainable Recommender Systems
AgenticRAG: Tool-Augmented Foundation Models for Zero-Shot Explainable Recommender Systems
Bo Ma
Hang Li
ZeHua Hu
XiaoFan Gui
LuYao Liu
Simon Liu
LRM
127
0
0
03 Oct 2025
Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation
Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation
Faruk Alpay
Hamdi Alakkad
64
0
0
03 Oct 2025
Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Best-of-Majority: Minimax-Optimal Strategy for Pass@kkk Inference Scaling
Qiwei Di
Kaixuan Ji
Xuheng Li
Heyang Zhao
Quanquan Gu
113
1
0
03 Oct 2025
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
Yaxin Du
Y. Zhang
Xiyuan Yang
Yifan Zhou
Cheng-Yu Wang
...
Menglan Chen
Shuo Tang
Z. Li
Feiyu Xiong
Siheng Chen
173
0
0
02 Oct 2025
MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models
MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models
Kevin Zhai
Utsav Singh
Anirudh Thatipelli
Souradip Chakraborty
Anit Kumar Sahu
Furong Huang
Amrit Singh Bedi
Mubarak Shah
EGVM
180
1
0
02 Oct 2025
FlashResearch: Real-time Agent Orchestration for Efficient Deep Research
FlashResearch: Real-time Agent Orchestration for Efficient Deep Research
Lunyiu Nie
Nedim Lipka
Ryan Rossi
S. Chaudhuri
121
0
0
02 Oct 2025
How Well Can Preference Optimization Generalize Under Noisy Feedback?
How Well Can Preference Optimization Generalize Under Noisy Feedback?
Shawn Im
Yixuan Li
227
1
0
01 Oct 2025
Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery
Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery
Bowen Wei
151
1
0
01 Oct 2025
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents
Zikang Liu
Junyi Li
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Ji-Rong Wen
LLMAG
159
2
0
01 Oct 2025
Optimal Stopping vs Best-of-$N$ for Inference Time Optimization
Optimal Stopping vs Best-of-NNN for Inference Time Optimization
Y. Kalayci
Vinod Raman
S. Dughmi
124
0
0
01 Oct 2025
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Sirou Zhu
Yanbin Jiang
Hejian Sang
Shao Tang
Qingquan Song
Biao He
Rohit Jain
Zhipeng Wang
Alborz Geramifard
144
0
0
30 Sep 2025
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Manuel Cherep
Chengtian Ma
Abigail Xu
Maya Shaked
Pattie Maes
Nikhil Singh
105
0
0
30 Sep 2025
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
Leitian Tao
Xuefeng Du
Shouqing Yang
SyDa
208
1
0
30 Sep 2025
Humanline: Online Alignment as Perceptual Loss
Humanline: Online Alignment as Perceptual Loss
Sijia Liu
Niklas Muennighoff
Kawin Ethayarajh
88
0
0
29 Sep 2025
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Xiaoyu Liu
Di Liang
Hongyu Shan
Peiyang Liu
Yonghao Liu
...
Yuntao Li
Xianjie Wu
LI Miao
Jiangrong Shen
Minlong Peng
LRM
174
2
0
29 Sep 2025
Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries
Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries
Nick Hagar
Wilma Agustianto
Nicholas Diakopoulos
HILM
80
0
0
29 Sep 2025
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
Chenyu Zhou
Xiaoming Shi
Hui Qiu
Xiawu Zheng
Haitao Leng
Yankai Jiang
Shaoguo Liu
Tingting Gao
Rongrong Ji
137
1
0
28 Sep 2025
Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
Matteo Boffa
Jiaxuan You
179
0
0
28 Sep 2025
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
Min-Hsuan Yeh
Yixuan Li
191
1
0
28 Sep 2025
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Jianshuo Dong
Sheng Guo
Hao Wang
Zhuotao Liu
Tianwei Zhang
Tianwei Zhang
Ke Xu
Shiyu Huang
Han Qiu
LLMAG
333
1
0
28 Sep 2025
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
Huacan Chai
Zijie Cao
M. R
Y. Yang
Jianghao Lin
...
Muning Wen
Weiwen Liu
Weinan Zhang
Fei Huang
Y. Wen
OffRLAIFinLRM
274
0
0
27 Sep 2025
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Yehonatan Peisakhovsky
Zorik Gekhman
Y. Mass
Liat Ein-Dor
Roi Reichart
HILM
160
1
0
26 Sep 2025
Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval
Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval
Vivek Bhavsar
Joseph Ereifej
Aravanan Gurusami
RALM
104
0
0
25 Sep 2025
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
Madeleine Dwyer
Adam Sobey
Adriane Chapman
73
0
0
25 Sep 2025
ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools
ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools
Quy Minh Le
Minh Sao Khue Luu
Khanh-Tung Tran
Duc-Hai Nguyen
Hoang-Quoc-Viet Pham
Quan Le
Hoang Thanh Lam
Hoang D. Nguyen
82
1
0
24 Sep 2025
Reflect before Act: Proactive Error Correction in Language Models
Reflect before Act: Proactive Error Correction in Language Models
Qiuhai Zeng
Sarvesh Rajkumar
Di Wang
Narendra Gyanchandani
Wenbo Yan
KELMLLMAG
105
0
0
23 Sep 2025
Asking a Language Model for Diverse Responses
Asking a Language Model for Diverse Responses
S. Troshin
Irina Saparina
Antske Fokkens
Vlad Niculae
LRM
101
0
0
22 Sep 2025
Towards General Computer Control with Hierarchical Agents and Multi-Level Action Spaces
Towards General Computer Control with Hierarchical Agents and Multi-Level Action Spaces
Zihan Dong
Xinyu Fan
Zixiang Tang
Yunqing Li
LM&Ro
128
0
0
22 Sep 2025
UIPro: Unleashing Superior Interaction Capability For GUI Agents
UIPro: Unleashing Superior Interaction Capability For GUI Agents
Hongxin Li
Jingran Su
Jingfan Chen
Zheng Ju
Yuntao Chen
Qing Li
Zhaoxiang Zhang
LLMAG
236
0
0
22 Sep 2025
Governing Automated Strategic Intelligence
Governing Automated Strategic Intelligence
Nicholas Kruus
Madhavendra Thakur
Adam Khoja
Leonhard Nagel
Maximilian Nicholson
...
Janghee Lee
Nina Sefton
Raghavendra Thakur
Shiv Munagala
Yeeun Kim
104
0
0
21 Sep 2025
SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing
SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing
Junlong Ke
Qiying Hu
Shenghai Yuan
Yuecong Xu
Jianfei Yang
LLMAG
197
0
0
21 Sep 2025
RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
Chao Yu
Y. Wang
Zhen Guo
Hao Lin
Si Xu
...
Z. Yang
Guohao Dai
Yu Wang
Guohao Dai
Yu Wang
AI4CE
125
3
0
19 Sep 2025
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
George Correa de Araujo
H. Maia
Hélio Pedrini
144
0
0
17 Sep 2025
SIRAG: Towards Stable and Interpretable RAG with A Process-Supervised Multi-Agent Framework
SIRAG: Towards Stable and Interpretable RAG with A Process-Supervised Multi-Agent Framework
Junlin Wang
Zehao Wu
Shaowei Lu
Yanlan Li
Xinghao Huang
121
1
0
17 Sep 2025
Realistic Environmental Injection Attacks on GUI Agents
Realistic Environmental Injection Attacks on GUI Agents
Yitong Zhang
Ximo Li
L. Cai
Jia Li
LLMAGAAML
119
2
0
14 Sep 2025
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Rui Lu
Zhenyu Hou
Zihan Wang
Hanchen Zhang
Xiao-Yang Liu
Yujiang Li
Shi Feng
Jie Tang
Yuxiao Dong
RALMKELM
280
15
0
12 Sep 2025
K2-Think: A Parameter-Efficient Reasoning System
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng
Richard Fan
Shibo Hao
Taylor W. Killian
Haonan Li
...
Xuezhe Ma
Guowei He
Zhiting Hu
Zhengzhong Liu
Eric P. Xing
ReLMOffRLALMLRM
303
4
0
09 Sep 2025
VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction
VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction
Jie Yang
Jiajun Chen
Zhangyue Yin
Shuo Chen
Y. Wang
Yiran Guo
Yuan Li
Y. Zheng
Xuanjing Huang
Xipeng Qiu
146
0
0
08 Sep 2025
Previous
12345...212223
Next