v1v2v3 (latest)

WebGPT: Browser-assisted question-answering with human feedback

17 December 2021

Tyna Eloundou

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "WebGPT: Browser-assisted question-answering with human feedback"

50 / 1,123 papers shown

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

...

438

24 Dec 2025

Learning to Orchestrate Agents in Natural Language with the Conductor

107

04 Dec 2025

On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference

183

04 Dec 2025

Process-Centric Analysis of Agentic Software Systems

02 Dec 2025

Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking

Lingling Fu

MoMe

128

30 Nov 2025

Evolving Paradigms in Task-Based Search and Learning: A Comparative Analysis of Traditional Search Engine with LLM-Enhanced Conversational Search System

Zhitong Guan

Yi Wang

29 Nov 2025

An Empirical Study on the Security Vulnerabilities of GPTs

153

28 Nov 2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

...

262

26 Nov 2025

VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering

145

25 Nov 2025

ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training

184

25 Nov 2025

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

140

24 Nov 2025

SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

106

22 Nov 2025

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

20 Nov 2025

Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework

201

19 Nov 2025

It's LIT! Reliability-Optimized LLMs with Inspectable Tools

105

18 Nov 2025

WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

621

17 Nov 2025

From Experience to Strategy: Empowering LLM Agents with Trainable Graph MemoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

477

11 Nov 2025

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

...

337

10 Nov 2025

Inference-Time Personalized Alignment with a Few User Preference Queries

Victor-Alexandru Pădurean

Parameswaran Kamalaruban

Nachiket Kotalwar

Alkis Gotovos

Adish Singla

171

04 Nov 2025

An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks

116

04 Nov 2025

Tool Zero: Training Tool-Augmented LLMs via Pure RL from ScratchConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

...

294

02 Nov 2025

A CPU-Centric Perspective on Agentic AI

Ritik Raj

Hong Wang

Tushar Krishna

296

01 Nov 2025

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

377

01 Nov 2025

ToolRM: Towards Agentic Tool-Use Reward Modeling

Junyang Lin

Min Yang

LRM

161

30 Oct 2025

Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models

Sriram Balasubramaniam

366

29 Oct 2025

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

...

173

28 Oct 2025

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

...

188

27 Oct 2025

The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

115

27 Oct 2025

Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models

27 Oct 2025

Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

24 Oct 2025

PanicToCalm: A Proactive Counseling Agent for Panic Attacks

167

24 Oct 2025

Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models

OffRL CLL KELM VLM LRM

135

24 Oct 2025

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

...

155

22 Oct 2025

Crucible: Quantifying the Potential of Control Algorithms through LLM Agents

112

21 Oct 2025

CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent

...

168

21 Oct 2025

Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models

127

20 Oct 2025

Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents

...

158

20 Oct 2025

WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

17 Oct 2025

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

...

208

17 Oct 2025

Natural Language Tools: A Natural Language Approach to Tool Calling In Large Language Agents

207

16 Oct 2025

Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking

187

15 Oct 2025

Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism

226

15 Oct 2025

Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation

196

15 Oct 2025

On the Role of Preference Variance in Preference Optimization

159

14 Oct 2025

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

250

13 Oct 2025

Attacks by Content: Automated Fact-checking is an AI Security Issue

Michael Schlichtkrull

AAML

116

13 Oct 2025

Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers

Tuan Nguyen

Long Tran-Thanh

LLMAG

123

10 Oct 2025

Fundamentals of Building Autonomous LLM Agents

Victor de Lamo Castrillo

207

10 Oct 2025

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

Tajamul Ashraf

Umair Nawaz

Abdelrahman M. Shaker

227

09 Oct 2025

Memory Retrieval and Consolidation in Large Language Models through Function Tokens

09 Oct 2025