ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.09332
  4. Cited By
WebGPT: Browser-assisted question-answering with human feedback

WebGPT: Browser-assisted question-answering with human feedback

17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
    ALM
    RALM
ArXivPDFHTML

Papers citing "WebGPT: Browser-assisted question-answering with human feedback"

50 / 905 papers shown
Title
Are AI Agents interacting with Online Ads?
Are AI Agents interacting with Online Ads?
Andreas Stöckl
Joel Nitu
35
0
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at ResearchTrend Connect | LLMAG on 07 May 2025
93
7
0
20 Mar 2025
A Review on Large Language Models for Visual Analytics
A Review on Large Language Models for Visual Analytics
Navya Sonal Agarwal
Sanjay Kumar Sonbhadra
41
0
0
19 Mar 2025
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
David Wan
Justin Chih-Yao Chen
Elias Stengel-Eskin
Mohit Bansal
LLMAG
LRM
60
1
0
19 Mar 2025
MP-GUI: Modality Perception with MLLMs for GUI Understanding
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang
Weizhi Chen
Leyang Yang
Sheng Zhou
Shengchu Zhao
Hanbei Zhan
Jiongchao Jin
Liangcheng Li
Zirui Shao
Jiajun Bu
60
1
0
18 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
67
11
0
14 Mar 2025
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
Mourad Gridach
Jay Nanavati
Khaldoun Zine El Abidine
Lenon Mendes
Christina Mack
50
6
0
12 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Q. Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Enhong Chen
3DV
68
3
0
11 Mar 2025
Robust Multi-Objective Controlled Decoding of Large Language Models
Seongho Son
William Bankes
Sangwoong Yoon
Shyam Sundhar Ramesh
Xiaohang Tang
Ilija Bogunovic
39
0
0
11 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang
Min-hwan Oh
OffRL
45
0
0
07 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Joey Tianyi Zhou
Tony Q. S. Quek
Soujanya Poria
Zuozhu Liu
48
0
0
06 Mar 2025
ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making
Yitong Luo
Hou Hei Lam
Ziang Chen
Zhenliang Zhang
Xue Feng
67
0
0
06 Mar 2025
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
H. Kim
Kanghoon Lee
J. Park
Jiachen Li
Jinkyoo Park
60
1
0
05 Mar 2025
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang
Yongliang Shen
Zhe Zheng
Linjuan Wu
Wenqi Zhang
Yuchen Yan
Qiuying Peng
J. Wang
Weiming Lu
KELM
77
1
0
03 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Y. Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Z. Zhang
Fei Huang
Rui Wang
BDL
LRM
62
6
0
03 Mar 2025
Dynamic Search for Inference-Time Alignment in Diffusion Models
Xiner Li
Masatoshi Uehara
Xingyu Su
Gabriele Scalia
Tommaso Biancalani
Aviv Regev
Sergey Levine
Shuiwang Ji
42
0
0
03 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
44
5
0
27 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Q. Liu
Wenjie Li
ELM
66
0
0
26 Feb 2025
Conversational Planning for Personal Plans
Konstantina Christakopoulou
Iris Qu
John Canny
Andrew Goodridge
Cj Adams
Minmin Chen
Maja Matarić
LLMAG
LM&Ro
55
0
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
163
0
0
26 Feb 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Ameya Prabhu
ReLM
ELM
LRM
129
0
0
26 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
X. Wang
42
0
0
25 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
J. Wang
Kun Shao
OffRL
34
12
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
43
1
0
24 Feb 2025
A Survey of Model Architectures in Information Retrieval
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
48
2
0
21 Feb 2025
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Faster WIND: Accelerating Iterative Best-of-NNN Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
43
4
0
20 Feb 2025
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
67
0
0
20 Feb 2025
Solving the Cold Start Problem on One's Own as an End User via Preference Transfer
Solving the Cold Start Problem on One's Own as an End User via Preference Transfer
Ryoma Sato
68
0
0
18 Feb 2025
RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team
RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team
Xuanzhong Chen
Ye Jin
Xiaohao Mao
Lun Wang
Shuyang Zhang
Ting Chen
77
0
0
17 Feb 2025
AgentStudio: A Toolkit for Building General Virtual Agents
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
80
14
0
17 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
68
0
0
17 Feb 2025
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning
Qingwen Lin
Boyan Xu
Zijian Li
Z. Hao
Keli Zhang
Ruichu Cai
LRM
41
2
0
16 Feb 2025
CiteCheck: Towards Accurate Citation Faithfulness Detection
CiteCheck: Towards Accurate Citation Faithfulness Detection
Ziyao Xu
Shaohang Wei
Zhuoheng Han
Jing Jin
Z. Yang
Xiaoguang Li
Haochen Tan
Zhijiang Guo
Houfeng Wang
29
0
0
15 Feb 2025
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Xinyin Ma
Guangnian Wan
Runpeng Yu
Gongfan Fang
Xinchao Wang
LRM
76
19
0
13 Feb 2025
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Guoxin Chen
Minpeng Liao
Peiying Yu
Dingmin Wang
Zile Qiao
Chao Yang
Xin Zhao
Kai Fan
58
1
0
10 Feb 2025
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Yan Weng
Fengbin Zhu
Tong Ye
Haoyan Liu
Fuli Feng
Tat-Seng Chua
RALM
98
1
0
10 Feb 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
Bryan Guan
Tanya Roosta
Peyman Passban
Mehdi Rezagholizadeh
97
0
0
06 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou
Mirella Lapata
MoMe
169
1
0
03 Feb 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi
Yueqi Xie
Bin Zhu
Emre Kiciman
Guangzhong Sun
Xing Xie
Fangzhao Wu
AAML
51
64
0
28 Jan 2025
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task
Ziije Zhong
Linqing Zhong
Zhaoze Sun
Qingyun Jin
Zengchang Qin
Xiaofan Zhang
52
7
0
28 Jan 2025
Chain-of-Retrieval Augmented Generation
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALM
LRM
ReLM
3DV
84
6
0
24 Jan 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
57
2
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
102
18
0
17 Jan 2025
Authenticated Delegation and Authorized AI Agents
Authenticated Delegation and Authorized AI Agents
Tobin South
Samuele Marro
Thomas Hardjono
Robert Mahari
Cedric Deslandes Whitney
Dazza Greenwood
Alan Chan
Alex Pentland
44
3
0
17 Jan 2025
WebWalker: Benchmarking LLMs in Web Traversal
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu
Wenbiao Yin
Yong-feng Jiang
Zhenglin Wang
Zekun Xi
...
Linhai Zhang
Yulan He
Deyu Zhou
Pengjun Xie
Fei Huang
43
5
0
13 Jan 2025
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Hiroki Furuta
Yutaka Matsuo
Aleksandra Faust
Izzeddin Gur
CLL
85
13
0
03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
52
96
0
03 Jan 2025
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
OSLM
LRM
108
408
0
03 Jan 2025
Enhancing Preference-based Linear Bandits via Human Response Time
Enhancing Preference-based Linear Bandits via Human Response Time
Shen Li
Yuyang Zhang
Zhaolin Ren
Claire Liang
Na Li
J. Shah
34
0
0
03 Jan 2025
Previous
12345...171819
Next