Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.09332
Cited By
v1
v2
v3 (latest)
WebGPT: Browser-assisted question-answering with human feedback
17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"WebGPT: Browser-assisted question-answering with human feedback"
50 / 1,123 papers shown
OmniNova:A General Multimodal Agent Framework
Pengfei Du
LLMAG
213
0
0
25 Mar 2025
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian Bartoldson
S. Venkatraman
James Diffenderfer
Moksh Jain
Tal Ben-Nun
Seanie Lee
Minsu Kim
J. Obando-Ceron
Yoshua Bengio
B. Kailkhura
OffRL
334
12
0
24 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
Fan Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
450
15
0
24 Mar 2025
ExpertRAG: Efficient RAG with Mixture of Experts -- Optimizing Context Retrieval for Adaptive LLM Responses
Esmail Gumaan
MoE
314
2
0
23 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jian Guan
Jian Wu
Jia-Nan Li
Chuanqi Cheng
Wei Wu
LM&MA
769
14
0
21 Mar 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Longji Xu
Yilong Xu
Cunchun Li
Shansong Liu
Xueqi Cheng
KELM
296
25
0
20 Mar 2025
Are AI Agents interacting with Online Ads?
Andreas Stöckl
Joel Nitu
472
2
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
506
77
0
20 Mar 2025
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
David Wan
Justin Chih-Yao Chen
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMAG
LRM
279
5
0
19 Mar 2025
A Review on Large Language Models for Visual Analytics
Navya Sonal Agarwal
Sanjay Kumar Sonbhadra
367
7
0
19 Mar 2025
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Ziwei Wang
Weizhi Chen
Leyang Yang
Sheng Zhou
Shengchu Zhao
Hanbei Zhan
Jiongchao Jin
Liangcheng Li
Zirui Shao
Jiajun Bu
338
9
0
18 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
442
129
0
14 Mar 2025
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
Mourad Gridach
Jay Nanavati
Khaldoun Zine El Abidine
Lenon Mendes
Christina Mack
371
52
0
12 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Qiang Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Tong Xu
3DV
374
37
0
11 Mar 2025
Robust Multi-Objective Controlled Decoding of Large Language Models
Seongho Son
William Bankes
Sangwoong Yoon
Shyam Sundhar Ramesh
Xiaohang Tang
Ilija Bogunovic
353
6
0
11 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
International Conference on Learning Representations (ICLR), 2025
Hyungkyu Kang
Min-hwan Oh
OffRL
337
2
0
07 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
539
3
0
06 Mar 2025
ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making
Yitong Luo
Hou Hei Lam
Ziang Chen
Zhenliang Zhang
Xue Feng
313
0
0
06 Mar 2025
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
Haksub Kim
Kanghoon Lee
Minjun Kim
Jiachen Li
Jinkyoo Park
425
4
0
05 Mar 2025
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang
Yongliang Shen
Zhe Zheng
Linjuan Wu
Wenqi Zhang
Yuchen Yan
Qiuying Peng
Jun Wang
Weiming Lu
KELM
449
5
0
03 Mar 2025
Dynamic Search for Inference-Time Alignment in Diffusion Models
Xiner Li
Masatoshi Uehara
Xingyu Su
Gabriele Scalia
Tommaso Biancalani
Aviv Regev
Sergey Levine
Shuiwang Ji
420
22
0
03 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yun Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Zizhuo Zhang
Fei Huang
Rui Wang
BDL
LRM
500
38
0
03 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
407
27
0
27 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Qiang Liu
Wenjie Li
ELM
528
1
0
26 Feb 2025
Conversational Planning for Personal Plans
Konstantina Christakopoulou
Iris Qu
John Canny
Andrew Goodridge
Cj Adams
Minmin Chen
Maja Matarić
LLMAG
LM&Ro
240
0
0
26 Feb 2025
Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
Siyuan Zhang
Yuanhang Zhang
Yinpeng Dong
Hang Su
HILM
KELM
987
2
0
26 Feb 2025
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Christian Schroeder de Witt
ReLM
ELM
LRM
484
1
0
26 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Qingsong Wen
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
213
3
0
25 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Hai-Jian Ke
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
505
14
0
24 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
International Conference on Learning Representations (ICLR), 2024
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Youssef Attia El Hili
OffRL
502
52
0
24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
364
10
0
24 Feb 2025
On Synthesizing Data for Context Attribution in Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Gorjan Radevski
Kiril Gashteovski
Shahbaz Syed
Christopher Malon
Sebastien Nicolas
...
Masafumi Enomoto
Kunihiro Takeoka
Masafumi Oyamada
Goran Glavaš
Carolin (Haas) Lawrence
246
0
0
21 Feb 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
3DV
KELM
585
18
0
20 Feb 2025
STaR-SQL: Self-Taught Reasoner for Text-to-SQL
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mingqian He
Yongliang Shen
Weinan Zhang
Qiuying Peng
Jun Wang
Weiming Lu
ReLM
LRM
200
10
0
20 Feb 2025
Faster WIND: Accelerating Iterative Best-of-
N
N
N
Distillation for LLM Alignment
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
371
6
0
20 Feb 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
422
5
0
18 Feb 2025
Solving the Cold Start Problem on One's Own as an End User via Preference Transfer
Ryoma Sato
259
0
0
18 Feb 2025
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo
Yukang Yang
Hui Yuan
Mengdi Wang
428
10
0
17 Feb 2025
AgentStudio: A Toolkit for Building General Virtual Agents
International Conference on Learning Representations (ICLR), 2024
Longtao Zheng
Zhiyuan Huang
Zhenghai Xue
Xinrun Wang
Bo An
Shuicheng Yan
448
34
0
17 Feb 2025
CMCTS: A Constrained Monte Carlo Tree Search Framework for Mathematical Reasoning in Large Language Model
Qingwen Lin
Boyan Xu
Zijian Li
Zijian Li
Keli Zhang
Ruichu Cai
Ruichu Cai
LRM
345
4
0
16 Feb 2025
CiteCheck: Towards Accurate Citation Faithfulness Detection
Ziyao Xu
Shaohang Wei
Zhuoheng Han
Jing Jin
Zhiyong Yang
Xiaoguang Li
Haochen Tan
Zhijiang Guo
Houfeng Wang
181
1
0
15 Feb 2025
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xinyin Ma
Guangnian Wan
Runpeng Yu
Gongfan Fang
Xinchao Wang
LRM
501
120
0
13 Feb 2025
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Guoxin Chen
Minpeng Liao
Peiying Yu
Dingmin Wang
Zile Qiao
Chao Yang
Xin Zhao
Kai Fan
452
5
0
10 Feb 2025
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Yan Weng
Fengbin Zhu
Tong Ye
Haoyan Liu
Fuli Feng
Tat-Seng Chua
RALM
397
4
0
10 Feb 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
Bryan Guan
Tanya Roosta
Peyman Passban
Mehdi Rezagholizadeh
425
0
0
06 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Litu Ou
Mirella Lapata
MoMe
1.1K
3
0
03 Feb 2025
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task
Ziije Zhong
Linqing Zhong
Zhaoze Sun
Qingyun Jin
Zengchang Qin
Xiaofan Zhang
325
22
0
28 Jan 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Knowledge Discovery and Data Mining (KDD), 2023
Jingwei Yi
Yueqi Xie
Bin Zhu
Emre Kiciman
Guangzhong Sun
Xing Xie
Fangzhao Wu
AAML
452
154
0
28 Jan 2025
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALM
LRM
3DV
ReLM
435
27
0
24 Jan 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
462
5
0
20 Jan 2025
Previous
1
2
3
...
5
6
7
...
21
22
23
Next