Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2112.09332
Cited By

WebGPT: Browser-assisted question-answering with human feedback

v1v2v3 (latest)

WebGPT: Browser-assisted question-answering with human feedback

17 December 2021

Reiichiro Nakano

Christopher Hesse

William Saunders

Tyna Eloundou

Gretchen Krueger

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "WebGPT: Browser-assisted question-answering with human feedback"

50 / 1,123 papers shown

Prepared mind, fast response: A temporal decoupling framework for adaptive knowledge orchestration in open-domain dialogue

Prepared mind, fast response: A temporal decoupling framework for adaptive knowledge orchestration in open-domain dialogue

85

0

0

09 Oct 2025

FlowSearch: Advancing deep research with dynamic structured knowledge flow

FlowSearch: Advancing deep research with dynamic structured knowledge flow

...

Wenlong Zhang

Lei Bai

Bo Zhang

158

1

0

09 Oct 2025

CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization

CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization

Luca Beurer-Kellner

Maximilian Baader

154

0

0

09 Oct 2025

CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search

CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search

100

1

0

09 Oct 2025

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

Abdelrahman M. Shaker

Rao Muhammad Anwer

Fahad Shahbaz Khan

230

0

0

09 Oct 2025

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning

60

1

0

08 Oct 2025

Exposing Citation Vulnerabilities in Generative Engines

Exposing Citation Vulnerabilities in Generative Engines

Shusuke Komatsu

156

0

0

08 Oct 2025

MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

...

154

0

0

06 Oct 2025

Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts

Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts

Arun Kumar Chithanar

Anit Kumar Sahu

Souradip Chakraborty

Amrit Singh Bedi

208

0

0

06 Oct 2025

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

...

Masashi Sugiyama

117

0

0

05 Oct 2025

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

Patrice Béchard

Orlando Marquez Ayala

Mathieu Reymond

Alexandre Drouin

Alexandre Lacoste

139

2

0

05 Oct 2025

Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling

Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling

155

0

0

05 Oct 2025

AgenticRAG: Tool-Augmented Foundation Models for Zero-Shot Explainable Recommender Systems

AgenticRAG: Tool-Augmented Foundation Models for Zero-Shot Explainable Recommender Systems

129

0

0

03 Oct 2025

Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation

Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation

68

0

0

03 Oct 2025

Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling

Best-of-Majority: Minimax-Optimal Strategy for Pass@

k

Inference Scaling

113

1

0

03 Oct 2025

InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

...

173

0

0

02 Oct 2025

MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models

MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models

Anirudh Thatipelli

Souradip Chakraborty

Anit Kumar Sahu

Amrit Singh Bedi

180

2

0

02 Oct 2025

FlashResearch: Real-time Agent Orchestration for Efficient Deep Research

FlashResearch: Real-time Agent Orchestration for Efficient Deep Research

125

1

0

02 Oct 2025

How Well Can Preference Optimization Generalize Under Noisy Feedback?

How Well Can Preference Optimization Generalize Under Noisy Feedback?

230

1

0

01 Oct 2025

Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery

Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery

151

2

0

01 Oct 2025

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

166

2

0

01 Oct 2025

Optimal Stopping vs Best-of-$N$ for Inference Time Optimization

Optimal Stopping vs Best-of-

N

for Inference Time Optimization

127

0

0

01 Oct 2025

Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Alborz Geramifard

151

0

0

30 Sep 2025

A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments

A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments

105

1

0

30 Sep 2025

Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis

Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis

215

1

0

30 Sep 2025

Humanline: Online Alignment as Perceptual Loss

Humanline: Online Alignment as Perceptual Loss

Niklas Muennighoff

Kawin Ethayarajh

92

0

0

29 Sep 2025

Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling

Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling

...

174

3

0

29 Sep 2025

Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries

Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries

Wilma Agustianto

Nicholas Diakopoulos

96

0

0

29 Sep 2025

Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules

Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules

141

1

0

28 Sep 2025

Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?

Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?

182

0

0

28 Sep 2025

Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment

Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment

200

1

0

28 Sep 2025

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

337

1

0

28 Sep 2025

PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness

PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness

...

OffRL AIFin LRM

279

0

0

27 Sep 2025

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Yehonatan Peisakhovsky

162

1

0

26 Sep 2025

Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval

Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval

Aravanan Gurusami

110

0

0

25 Sep 2025

It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL

It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL

Madeleine Dwyer

Adriane Chapman

88

0

0

25 Sep 2025

ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

Minh Sao Khue Luu

Khanh-Tung Tran

Hoang-Quoc-Viet Pham

Hoang Thanh Lam

Hoang D. Nguyen

97

1

0

24 Sep 2025

Reflect before Act: Proactive Error Correction in Language Models

Reflect before Act: Proactive Error Correction in Language Models

Sarvesh Rajkumar

Narendra Gyanchandani

105

0

0

23 Sep 2025

Asking a Language Model for Diverse Responses

Asking a Language Model for Diverse Responses

116

1

0

22 Sep 2025

Towards General Computer Control with Hierarchical Agents and Multi-Level Action Spaces

Towards General Computer Control with Hierarchical Agents and Multi-Level Action Spaces

131

0

0

22 Sep 2025

UIPro: Unleashing Superior Interaction Capability For GUI Agents

UIPro: Unleashing Superior Interaction Capability For GUI Agents

Zhaoxiang Zhang

236

0

0

22 Sep 2025

Governing Automated Strategic Intelligence

Governing Automated Strategic Intelligence

Madhavendra Thakur

Maximilian Nicholson

...

Raghavendra Thakur

113

0

0

21 Sep 2025

SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing

SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing

206

1

0

21 Sep 2025

RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation

RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation

...

Guohao Dai

Yu Wang

127

3

0

19 Sep 2025

A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts

A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts

George Correa de Araujo

144

0

0

17 Sep 2025

SIRAG: Towards Stable and Interpretable RAG with A Process-Supervised Multi-Agent Framework

SIRAG: Towards Stable and Interpretable RAG with A Process-Supervised Multi-Agent Framework

122

1

0

17 Sep 2025

Realistic Environmental Injection Attacks on GUI Agents

Realistic Environmental Injection Attacks on GUI Agents

122

2

0

14 Sep 2025

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

294

18

0

12 Sep 2025

K2-Think: A Parameter-Efficient Reasoning System

K2-Think: A Parameter-Efficient Reasoning System

Taylor W. Killian

...

ReLM OffRL ALM LRM

307

5

0

09 Sep 2025

VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction

VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction

156

0

0

08 Sep 2025

1 2 3 4 5...21 22 23

Page 2 of 23

Pageof 23