Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.09332
Cited By
v1
v2
v3 (latest)
WebGPT: Browser-assisted question-answering with human feedback
17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"WebGPT: Browser-assisted question-answering with human feedback"
50 / 1,125 papers shown
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Matthew Renze
Erhan Guven
LRM
LLMAG
343
73
0
05 May 2024
Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Hamed Zamani
Michael Bendersky
355
43
0
05 May 2024
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andrei Thil
Mirela Popa
Gerasimos Spanakis
LLMAG
145
5
0
01 May 2024
Almanac Copilot: Towards Autonomous Electronic Health Record Navigation
C. Zakka
Joseph Cho
Gracia Fahed
R. Shad
Michael Moor
...
Vishnu Ravi
Oliver Aalami
Roxana Daneshjou
Akshay Chaudhari
W. Hiesinger
354
10
0
30 Apr 2024
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models
Alireza Salemi
Hamed Zamani
424
24
0
30 Apr 2024
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Tiziano Labruna
Jon Ander Campos
Gorka Azkune
229
20
0
30 Apr 2024
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
242
14
0
30 Apr 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
648
101
0
29 Apr 2024
GPT for Games: A Scoping Review (2020-2023)
Daijin Yang
Erica Kleinman
Casper Harteveld
AI4TS
AI4CE
305
26
0
27 Apr 2024
Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM
Xuan Zhang
Wei Gao
LRM
KELM
284
17
0
26 Apr 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
330
62
0
25 Apr 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee
Taywon Min
Minyong An
Dongyoon Hahm
Kimin Lee
Changyeon Kim
Kimin Lee
372
32
0
25 Apr 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
386
78
0
23 Apr 2024
Aligning LLM Agents by Learning Latent Preference from User Edits
Ge Gao
Alexey Taymanov
Eduardo Salinas
Paul Mineiro
Dipendra Kumar Misra
LLMAG
300
51
0
23 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
588
135
0
23 Apr 2024
Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering
Jiapeng Li
Runze Liu
Yabo Liu
Tong Zhou
Mingling Li
Xiang Chen
LRM
193
4
0
22 Apr 2024
Filtered Direct Preference Optimization
Tetsuro Morimura
Mitsuki Sakamoto
Yuu Jinnai
Kenshi Abe
Kaito Air
434
23
0
22 Apr 2024
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement
Ke-Shen Li
Shijie Cao
LLMAG
188
43
0
20 Apr 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
355
238
0
19 Apr 2024
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
Avinash Anand
Janak Kapuriya
Chhavi Kirtani
Apoorv Singh
Jay Saraf
Naman Lal
Jatin Kumar
A. Shivam
Astha Verma
R. Shah
OffRL
246
9
0
19 Apr 2024
Evaluating AI for Law: Bridging the Gap with Open-Source Solutions
R. Bhambhoria
Samuel Dahan
Jonathan Li
Xiaodan Zhu
ELM
155
11
0
18 Apr 2024
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Yizheng Huang
Jimmy X. Huang
3DV
RALM
329
96
0
17 Apr 2024
Crossing the principle-practice gap in AI ethics with ethical problem-solving
N. Corrêa
James William Santos
Camila Galvão
Marcelo Pasetti
Dieine Schiavon
Faizah Naqvi
Robayet Hossain
N. D. Oliveira
215
10
0
16 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
287
15
0
13 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
412
93
0
12 Apr 2024
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
461
34
0
12 Apr 2024
High-Dimension Human Value Representation in Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
625
13
0
11 Apr 2024
Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study
Alessandro Stolfo
RALM
HILM
256
9
0
10 Apr 2024
Improving Language Model Reasoning with Self-motivated Learning
International Conference on Language Resources and Evaluation (LREC), 2024
Yunlong Feng
Yang Xu
Libo Qin
Yasheng Wang
Wanxiang Che
LRM
ReLM
239
8
0
10 Apr 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Chonghua Wang
Haodong Duan
Songyang Zhang
Dahua Lin
Kai-xiang Chen
ELM
255
33
0
09 Apr 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
241
33
0
08 Apr 2024
Towards Understanding the Influence of Reward Margin on Preference Model Performance
Bowen Qin
Duanyu Feng
Xi Yang
144
7
0
07 Apr 2024
AI2Apps: A Visual IDE for Building LLM-based AI Agent Applications
Xin Pang
Zhucong Li
Jiaxiang Chen
Yuan Cheng
Yinghui Xu
Yuan Qi
LLMAG
141
5
0
07 Apr 2024
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
Dingfan Chen
Qing Li
Zongxiong Chen
Lei Ma
Jens Grossklags
Mario Fritz
HILM
217
19
0
06 Apr 2024
Aligning Diffusion Models by Optimizing Human Utility
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Yusuke Kato
Kazuki Kozuka
312
69
0
06 Apr 2024
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Knowledge Discovery and Data Mining (KDD), 2024
Hanyu Lai
Xiao Liu
Iat Long Iong
Shuntian Yao
Yuxuan Chen
...
Hao Yu
Hanchen Zhang
Xiaohan Zhang
Yuxiao Dong
Jie Tang
LM&Ro
LLMAG
234
19
0
04 Apr 2024
Learning to Plan and Generate Text with Citations
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Constanza Fierro
Reinald Kim Amplayo
Fantine Huot
Nicola De Cao
Joshua Maynez
Shashi Narayan
Mirella Lapata
290
32
0
04 Apr 2024
Empowering Biomedical Discovery with AI Agents
Cell (Cell), 2024
Shanghua Gao
Ada Fang
Yepeng Huang
Valentina Giunchiglia
Ayush Noori
Jonathan Richard Schwarz
Yasha Ektefaie
Jovana Kondic
Marinka Zitnik
LLMAG
AI4CE
270
224
0
03 Apr 2024
Asymptotics of Language Model Alignment
International Symposium on Information Theory (ISIT), 2024
Joy Qiping Yang
Salman Salamatian
Ziteng Sun
A. Suresh
Ahmad Beirami
251
37
0
02 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
451
3
0
01 Apr 2024
Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa
Aman Rangapur
Emma Strubell
Honglak Lee
Lu Wang
Iz Beltagy
Hao Peng
HILM
409
27
0
01 Apr 2024
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
Wei He
Shichun Liu
Jun Zhao
Yiwen Ding
Yi Lu
Zhiheng Xi
Tao Gui
Tao Gui
Xuanjing Huang
186
4
0
01 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
418
16
0
31 Mar 2024
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
Yuji Cao
Huan Zhao
Yuheng Cheng
Ting Shu
Guolong Liu
Gaoqi Liang
Junhua Zhao
Yun Li
LLMAG
KELM
OffRL
LM&Ro
414
156
0
30 Mar 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im
Yixuan Li
ALM
453
18
0
27 Mar 2024
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
Jin Peng Zhou
Charles Staats
Wenda Li
Christian Szegedy
Kilian Q. Weinberger
Yuhuai Wu
LRM
198
62
0
26 Mar 2024
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
LRM
AI4CE
ReLM
166
19
0
21 Mar 2024
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Kyungjae Lee
Dasol Hwang
Sunghyun Park
Youngsoo Jang
Moontae Lee
269
15
0
21 Mar 2024
A Roadmap Towards Automated and Regulated Robotic Systems
Yihao Liu
Mehran Armand
198
3
0
21 Mar 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
477
342
0
20 Mar 2024
Previous
1
2
3
...
11
12
13
...
21
22
23
Next
Page 12 of 23
Page
of 23
Go