ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.09332
  4. Cited By
WebGPT: Browser-assisted question-answering with human feedback
v1v2v3 (latest)

WebGPT: Browser-assisted question-answering with human feedback

17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
    ALMRALM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "WebGPT: Browser-assisted question-answering with human feedback"

50 / 1,123 papers shown
OmniNova:A General Multimodal Agent Framework
OmniNova:A General Multimodal Agent Framework
Pengfei Du
LLMAG
213
0
0
25 Mar 2025
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian Bartoldson
S. Venkatraman
James Diffenderfer
Moksh Jain
Tal Ben-Nun
Seanie Lee
Minsu Kim
J. Obando-Ceron
Yoshua Bengio
B. Kailkhura
OffRL
334
12
0
24 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Video-T1: Test-Time Scaling for Video Generation
Fan Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffMVGen
450
15
0
24 Mar 2025
ExpertRAG: Efficient RAG with Mixture of Experts -- Optimizing Context Retrieval for Adaptive LLM Responses
ExpertRAG: Efficient RAG with Mixture of Experts -- Optimizing Context Retrieval for Adaptive LLM Responses
Esmail Gumaan
MoE
314
2
0
23 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World ApplicationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jian Guan
Jian Wu
Jia-Nan Li
Chuanqi Cheng
Wei Wu
LM&MA
769
14
0
21 Mar 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Longji Xu
Yilong Xu
Cunchun Li
Shansong Liu
Xueqi Cheng
KELM
296
25
0
20 Mar 2025
Are AI Agents interacting with Online Ads?
Are AI Agents interacting with Online Ads?
Andreas Stöckl
Joel Nitu
472
2
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAGELM
506
77
0
20 Mar 2025
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent CollaborationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
David Wan
Justin Chih-Yao Chen
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMAGLRM
279
5
0
19 Mar 2025
A Review on Large Language Models for Visual Analytics
A Review on Large Language Models for Visual Analytics
Navya Sonal Agarwal
Sanjay Kumar Sonbhadra
367
7
0
19 Mar 2025
MP-GUI: Modality Perception with MLLMs for GUI Understanding
MP-GUI: Modality Perception with MLLMs for GUI UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Ziwei Wang
Weizhi Chen
Leyang Yang
Sheng Zhou
Shengchu Zhao
Hanbei Zhan
Jiongchao Jin
Liangcheng Li
Zirui Shao
Jiajun Bu
338
9
0
18 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
442
129
0
14 Mar 2025
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
Mourad Gridach
Jay Nanavati
Khaldoun Zine El Abidine
Lenon Mendes
Christina Mack
371
52
0
12 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Qiang Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Tong Xu
3DV
374
37
0
11 Mar 2025
Robust Multi-Objective Controlled Decoding of Large Language Models
Seongho Son
William Bankes
Sangwoong Yoon
Shyam Sundhar Ramesh
Xiaohang Tang
Ilija Bogunovic
353
6
0
11 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Adversarial Policy Optimization for Offline Preference-based Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025
Hyungkyu Kang
Min-hwan Oh
OffRL
337
2
0
07 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
539
3
0
06 Mar 2025
ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making
Yitong Luo
Hou Hei Lam
Ziang Chen
Zhenliang Zhang
Xue Feng
313
0
0
06 Mar 2025
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
Haksub Kim
Kanghoon Lee
Minjun Kim
Jiachen Li
Jinkyoo Park
425
4
0
05 Mar 2025
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang
Yongliang Shen
Zhe Zheng
Linjuan Wu
Wenqi Zhang
Yuchen Yan
Qiuying Peng
Jun Wang
Weiming Lu
KELM
449
5
0
03 Mar 2025
Dynamic Search for Inference-Time Alignment in Diffusion Models
Dynamic Search for Inference-Time Alignment in Diffusion Models
Xiner Li
Masatoshi Uehara
Xingyu Su
Gabriele Scalia
Tommaso Biancalani
Aviv Regev
Sergey Levine
Shuiwang Ji
420
22
0
03 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yun Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Zizhuo Zhang
Fei Huang
Rui Wang
BDLLRM
500
38
0
03 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
407
27
0
27 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Qiang Liu
Wenjie Li
ELM
528
1
0
26 Feb 2025
Conversational Planning for Personal Plans
Konstantina Christakopoulou
Iris Qu
John Canny
Andrew Goodridge
Cj Adams
Minmin Chen
Maja Matarić
LLMAGLM&Ro
240
0
0
26 Feb 2025
Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
Siyuan Zhang
Yuanhang Zhang
Yinpeng Dong
Hang Su
HILMKELM
987
2
0
26 Feb 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Christian Schroeder de Witt
ReLMELMLRM
484
1
0
26 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Qingsong Wen
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
213
3
0
25 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Hai-Jian Ke
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
505
14
0
24 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control AgentsInternational Conference on Learning Representations (ICLR), 2024
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Youssef Attia El Hili
OffRL
502
52
0
24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
364
10
0
24 Feb 2025
On Synthesizing Data for Context Attribution in Question Answering
On Synthesizing Data for Context Attribution in Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Gorjan Radevski
Kiril Gashteovski
Shahbaz Syed
Christopher Malon
Sebastien Nicolas
...
Masafumi Enomoto
Kunihiro Takeoka
Masafumi Oyamada
Goran Glavaš
Carolin (Haas) Lawrence
246
0
0
21 Feb 2025
A Survey of Model Architectures in Information Retrieval
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
3DVKELM
585
18
0
20 Feb 2025
STaR-SQL: Self-Taught Reasoner for Text-to-SQL
STaR-SQL: Self-Taught Reasoner for Text-to-SQLAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mingqian He
Yongliang Shen
Weinan Zhang
Qiuying Peng
Jun Wang
Weiming Lu
ReLMLRM
200
10
0
20 Feb 2025
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Faster WIND: Accelerating Iterative Best-of-NNN Distillation for LLM AlignmentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
371
6
0
20 Feb 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Rethinking Diverse Human Preference Learning through Principal Component AnalysisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
422
5
0
18 Feb 2025
Solving the Cold Start Problem on One's Own as an End User via Preference Transfer
Solving the Cold Start Problem on One's Own as an End User via Preference Transfer
Ryoma Sato
259
0
0
18 Feb 2025
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo
Yukang Yang
Hui Yuan
Mengdi Wang
428
10
0
17 Feb 2025
AgentStudio: A Toolkit for Building General Virtual Agents
AgentStudio: A Toolkit for Building General Virtual AgentsInternational Conference on Learning Representations (ICLR), 2024
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
448
34
0
17 Feb 2025
CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model
CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model
Qingwen Lin
Boyan Xu
Zijian Li
Zijian Li
Keli Zhang
Ruichu Cai
Ruichu Cai
LRM
345
4
0
16 Feb 2025
CiteCheck: Towards Accurate Citation Faithfulness Detection
CiteCheck: Towards Accurate Citation Faithfulness Detection
Ziyao Xu
Shaohang Wei
Zhuoheng Han
Jing Jin
Zhiyong Yang
Xiaoguang Li
Haochen Tan
Zhijiang Guo
Houfeng Wang
181
1
0
15 Feb 2025
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
CoT-Valve: Length-Compressible Chain-of-Thought TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xinyin Ma
Guangnian Wan
Runpeng Yu
Gongfan Fang
Xinchao Wang
LRM
501
120
0
13 Feb 2025
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Guoxin Chen
Minpeng Liao
Peiying Yu
Dingmin Wang
Zile Qiao
Chao Yang
Xin Zhao
Kai Fan
452
5
0
10 Feb 2025
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Yan Weng
Fengbin Zhu
Tong Ye
Haoyan Liu
Fuli Feng
Tat-Seng Chua
RALM
397
4
0
10 Feb 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
Bryan Guan
Tanya Roosta
Peyman Passban
Mehdi Rezagholizadeh
425
0
0
06 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Context-Aware Hierarchical Merging for Long Document SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Litu Ou
Mirella Lapata
MoMe
1.1K
3
0
03 Feb 2025
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task
Ziije Zhong
Linqing Zhong
Zhaoze Sun
Qingyun Jin
Zengchang Qin
Xiaofan Zhang
325
22
0
28 Jan 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsKnowledge Discovery and Data Mining (KDD), 2023
Jingwei Yi
Yueqi Xie
Bin Zhu
Emre Kiciman
Guangzhong Sun
Xing Xie
Fangzhao Wu
AAML
452
154
0
28 Jan 2025
Chain-of-Retrieval Augmented Generation
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALMLRM3DVReLM
435
27
0
24 Jan 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
462
5
0
20 Jan 2025
Previous
123...567...212223
Next