Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.09332
Cited By
v1
v2
v3 (latest)
WebGPT: Browser-assisted question-answering with human feedback
17 December 2021
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
Christina Kim
Christopher Hesse
Shantanu Jain
V. Kosaraju
William Saunders
Xu Jiang
K. Cobbe
Tyna Eloundou
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"WebGPT: Browser-assisted question-answering with human feedback"
50 / 1,125 papers shown
GUICourse: From General Vision Language Models to Versatile GUI Agents
Wentong Chen
Junbo Cui
Jinyi Hu
Yujia Qin
Junjie Fang
...
Yupeng Huo
Yuan Yao
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
421
94
0
17 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Neural Information Processing Systems (NeurIPS), 2024
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
273
10
0
15 Jun 2024
RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model
Hantao Zhou
Tianying Ji
Lukas Sommerhalder
Michael Goerner
Norman Hendrich
Jianwei Zhang
Fuchun Sun
Huazhe Xu
604
0
0
14 Jun 2024
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos
Steven Abreu
Tiffany D. Do
Ruofei Du
Eric J. Gonzalez
Lee Payne
Daniel J. McDuff
Mar Gonzalez-Franco
310
6
0
14 Jun 2024
HelpSteer2: Open-source dataset for training top-performing reward models
Zhilin Wang
Yi Dong
Olivier Delalleau
Jiaqi Zeng
Gerald Shen
Daniel Egert
Jimmy J. Zhang
Makesh Narsimhan Sreedhar
Oleksii Kuchaiev
AI4TS
315
171
0
12 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
344
4
0
12 Jun 2024
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
615
5
0
12 Jun 2024
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Wanrong Zhu
Heng Huang
130
7
0
11 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
420
3
0
11 Jun 2024
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only
Junhee Cho
Jihoon Kim
Daseul Bae
Jinho Choo
Youngjune Gwon
Yeong-Dae Kwon
LLMAG
120
4
0
11 Jun 2024
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent
Wenjia Xu
Zijian Yu
Yixu Wang
Jiuniu Wang
Yuanben Zhang
Guangzuo Li
Mugen Peng
LLMAG
472
7
0
11 Jun 2024
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs
Mert Yazan
Suzan Verberne
F. Situmeang
MQ
184
6
0
10 Jun 2024
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Youssef Mroueh
246
19
0
09 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
432
73
0
09 Jun 2024
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
I-Hung Hsu
Zifeng Wang
Long T. Le
Lesly Miculicich
Nanyun Peng
Zifeng Wang
Tomas Pfister
HILM
295
11
0
08 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
287
89
0
06 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Jinghan Zhang
Xiting Wang
Yiqiao Jin
Changyu Chen
Xinhao Zhang
Kunpeng Liu
ALM
272
27
0
06 Jun 2024
Tool-Planner: Task Planning with Clusters across Multiple Tools
Yanming Liu
Xinyue Peng
Jiannan Cao
Yuwei Zhang
Xuhong Zhang
Sheng Cheng
Xun Wang
Jianwei Yin
Xuhong Zhang
LLMAG
372
2
0
06 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Juil Sock
João F. Henriques
Jakob N. Foerster
223
2
0
05 Jun 2024
Re-ReST: Reflection-Reinforced Self-Training for Language Agents
Zi-Yi Dou
Cheng-Fu Yang
Xueqing Wu
Kai-Wei Chang
Nanyun Peng
LRM
574
19
0
03 Jun 2024
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Lin Gui
Cristina Garbacea
Victor Veitch
BDL
LM&MA
477
95
0
02 Jun 2024
Aligning Language Models with Demonstrated Feedback
Omar Shaikh
Michelle S. Lam
Joey Hejna
Yijia Shao
Michael S. Bernstein
Michael S. Bernstein
Diyi Yang
ALM
367
11
0
02 Jun 2024
Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
Mingda Li
Xinyu Li
Yifan Chen
Wenfeng Xuan
Weinan Zhang
RALM
429
2
0
31 May 2024
Transfer Q Star: Principled Decoding for LLM Alignment
Souradip Chakraborty
Soumya Suvra Ghosal
Ming Yin
Dinesh Manocha
Mengdi Wang
Amrit Singh Bedi
Furong Huang
282
42
0
30 May 2024
Group Robust Preference Optimization in Reward-free RLHF
Shyam Sundhar Ramesh
Yifan Hu
Iason Chaimalas
Viraj Mehta
Pier Giuseppe Sessa
Haitham Bou-Ammar
Ilija Bogunovic
329
87
0
30 May 2024
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models
Chen Zhang
Chengguang Tang
Dading Chong
Ke Shi
Guohua Tang
Feng Jiang
Haizhou Li
214
4
0
30 May 2024
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Xuanfa Jin
Ziyan Wang
Yali Du
Meng Fang
Haifeng Zhang
Jun Wang
OffRL
LLMAG
440
19
0
30 May 2024
Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion
Wei Cheng
Yuhan Wu
Wei Hu
209
38
0
30 May 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
329
25
0
29 May 2024
A Multi-Source Retrieval Question Answering Framework Based on RAG
Ridong Wu
Shuhong Chen
Xiangbiao Su
Yuankai Zhu
Yifei Liao
Jianming Wu
125
7
0
29 May 2024
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
239
41
0
29 May 2024
Evaluating the External and Parametric Knowledge Fusion of Large Language Models
Hao Zhang
Yuyang Zhang
Xiaoguang Li
Wenxuan Shi
Haonan Xu
...
Yasheng Wang
Lifeng Shang
Qun Liu
Yong Liu
Ruiming Tang
KELM
250
7
0
29 May 2024
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
Huanshuo Liu
Hao Zhang
Zhijiang Guo
Kuicai Dong
Xiangyang Li
Yi Quan Lee
Cong Zhang
Yong Liu
3DV
286
12
0
29 May 2024
Aligning to Thousands of Preferences via System Message Generalization
Seongyun Lee
Sue Hyun Park
Seungone Kim
Minjoon Seo
ALM
327
71
0
28 May 2024
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jun Xu
Jirong Wen
LLMAG
343
217
0
28 May 2024
M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions
Zheng Wang
Shu Xian Teo
Jieer Ouyang
Yongjun Xu
Wei Shi
RALM
VLM
214
45
0
26 May 2024
Multi-Reference Preference Optimization for Large Language Models
Hung Le
Quan Tran
D. Nguyen
Kien Do
Saloni Mittal
Kelechi Ogueji
Svetha Venkatesh
196
5
0
26 May 2024
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Zhengliang Shi
Shen Gao
Xiuyi Chen
Yue Feng
Lingyong Yan
Haibo Shi
D. Yin
Zhumin Chen
Suzan Verberne
LLMAG
366
6
0
26 May 2024
AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning
Minghao Chen
Yihang Li
Yanting Yang
Shiyu Yu
Binbin Lin
Xiaofei He
LLMAG
287
0
0
25 May 2024
Learning Generalizable Human Motion Generator with Reinforcement Learning
Yunyao Mao
Xiaoyang Liu
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
242
7
0
24 May 2024
Bayesian WeakS-to-Strong from Text Classification to Generation
Ziyun Cui
Ziyang Zhang
Wen Wu
Wen Wu
Chao Zhang
387
5
0
24 May 2024
SoAy: A Solution-based LLM API-using Methodology for Academic Information Seeking
Yuanchun Wang
Jifan Yu
Zijun Yao
Jing Zhang
Yuyang Xie
...
Yuanyao Li
Huihui Yuan
Lei Hou
Juan-Zi Li
Jie Tang
274
10
0
24 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Neural Information Processing Systems (NeurIPS), 2024
Yu Meng
Mengzhou Xia
Danqi Chen
543
791
0
23 May 2024
LIRE: listwise reward enhancement for preference alignment
Mingye Zhu
Yi Liu
Lei Zhang
Junbo Guo
Zhendong Mao
208
8
0
22 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
298
5
0
18 May 2024
Generative Artificial Intelligence: A Systematic Review and Applications
S. S. Sengar
Affan Bin Hasan
Sanjay Kumar
Fiona Carroll
MedIm
301
231
0
17 May 2024
Rethinking ChatGPT's Success: Usability and Cognitive Behaviors Enabled by Auto-regressive LLMs' Prompting
Xinzhe Li
Ming Liu
251
1
0
17 May 2024
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
274
209
0
13 May 2024
METAREFLECTION: Learning Instructions for Language Agents using Past Reflections
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Priyanshu Gupta
Shashank Kirtania
Ananya Singha
Sumit Gulwani
Arjun Radhakrishna
Sherry Shi
Gustavo Soares
LLMAG
164
16
0
13 May 2024
Value Augmented Sampling for Language Model Alignment and Personalization
Seungwook Han
Idan Shenfeld
Akash Srivastava
Yoon Kim
Pulkit Agrawal
OffRL
248
40
0
10 May 2024
Previous
1
2
3
...
10
11
12
...
21
22
23
Next
Page 11 of 23
Page
of 23
Go