Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.05862
Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
50 / 1,795 papers shown
Title
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
25
284
0
24 May 2023
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
27
12
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
25
54
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
13
11
0
24 May 2023
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts
Benfeng Xu
An Yang
Junyang Lin
Quang Wang
Chang Zhou
Yongdong Zhang
Zhendong Mao
ALM
31
130
0
24 May 2023
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
33
2,327
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
39
598
0
23 May 2023
DetGPT: Detect What You Need via Reasoning
Renjie Pi
Jiahui Gao
Shizhe Diao
Rui Pan
Hanze Dong
...
Lewei Yao
Jianhua Han
Hang Xu
Lingpeng Kong Tong Zhang
Tong Zhang
LRM
LM&Ro
22
92
0
23 May 2023
Learning from Mistakes via Cooperative Study Assistant for Large Language Models
Danqing Wang
Lei Li
23
6
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
73
67
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung (Vivian) Chen
11
89
0
23 May 2023
Training Priors Predict Text-To-Image Model Performance
Charles Lovering
Ellie Pavlick
CoGe
14
3
0
23 May 2023
The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models
Shuo Zhang
Liangming Pan
Junzhou Zhao
W. Wang
HILM
19
0
0
23 May 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
23
31
0
22 May 2023
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
39
313
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
40
536
0
22 May 2023
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
29
11
0
21 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
24
6
0
21 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
36
356
0
19 May 2023
Introspective Tips: Large Language Model for In-Context Decision Making
Liting Chen
Lu Wang
Hang Dong
Yali Du
Jie Yan
...
Pu Zhao
Si Qin
Saravan Rajmohan
Qingwei Lin
Dongmei Zhang
LLMAG
LRM
30
23
0
19 May 2023
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models
Wanqiao Xu
Shi Dong
Dilip Arumugam
Benjamin Van Roy
17
8
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
27
81
0
19 May 2023
LIMA: Less Is More for Alignment
Chunting Zhou
Pengfei Liu
Puxin Xu
Srini Iyer
Jiao Sun
...
Susan Zhang
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
Omer Levy
ALM
17
772
0
18 May 2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Yao-Min Zhao
Rishabh Joshi
Tianqi Liu
Misha Khalman
Mohammad Saleh
Peter J. Liu
13
264
0
17 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
58
1,138
0
17 May 2023
Prompt-Tuning Decision Transformer with Preference Ranking
Shengchao Hu
Li Shen
Ya-Qin Zhang
Dacheng Tao
OffRL
26
14
0
16 May 2023
Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives
Qiushi Sun
Chengcheng Han
Nuo Chen
Renyu Zhu
Jing Gong
Xiang Li
Ming Gao
VLM
22
8
0
14 May 2023
Leveraging Large Language Models in Conversational Recommender Systems
Luke Friedman
Sameer Ahuja
David Allen
Zhenning Tan
Hakim Sidahmed
...
Ajay Patel
Harsh Lara
Brian Chu
Zexiang Chen
Manoj Kumar Tiwari
25
100
0
13 May 2023
Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation
Jin-Fang Gao
Xiao Ding
Bing Qin
Ting Liu
ELM
LRM
AI4MH
28
59
0
12 May 2023
Taking Advice from ChatGPT
Peter Zhang
19
5
0
11 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Yikang Shen
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
12
313
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
209
568
0
03 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
163
349
0
02 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
27
85
0
02 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
113
56
0
01 May 2023
Origin Tracing and Detecting of LLMs
Linyang Li
Pengyu Wang
Kerong Ren
Tianxiang Sun
Xipeng Qiu
LLMAG
94
31
0
27 Apr 2023
Towards ethical multimodal systems
Alexis Roger
Esma Aïmeur
Irina Rish
19
3
0
26 Apr 2023
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework
Bin Wang
Xinnian Liang
Jian Yang
Huijia Huang
Shuangzhi Wu
Peihao Wu
Lu Lu
Zejun Ma
Zhoujun Li
LLMAG
KELM
RALM
94
25
0
26 Apr 2023
AGI: Artificial General Intelligence for Education
Ehsan Latif
Gengchen Mai
Matthew Nyaaba
Xuansheng Wu
Ninghao Liu
Guoyu Lu
Sheng R. Li
Tianming Liu
Xiaoming Zhai
ELM
AI4CE
21
22
0
24 Apr 2023
Fundamental Limitations of Alignment in Large Language Models
Yotam Wolf
Noam Wies
Oshri Avnery
Yoav Levine
Amnon Shashua
ALM
6
137
0
19 Apr 2023
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Chuanyang Zheng
Zhengying Liu
Enze Xie
Zhenguo Li
Yu Li
LLMAG
ReLM
LRM
19
100
0
19 Apr 2023
Chinese Open Instruction Generalist: A Preliminary Release
Ge Zhang
Yemin Shi
Ruibo Liu
Ruibin Yuan
Yizhi Li
...
Zhaoqun Li
Zekun Wang
Chenghua Lin
Wen-Fen Huang
Jie Fu
ALM
20
28
0
17 Apr 2023
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Andreas Kopf
Yannic Kilcher
Dimitri von Rutte
Sotiris Anagnostidis
Zhi Rui Tam
...
Arnav Dantuluri
Andrew Maguire
Christoph Schuhmann
Huu Nguyen
A. Mattick
ALM
LM&MA
44
579
0
14 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
6
399
0
13 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
29
310
0
12 Apr 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
15
342
0
11 Apr 2023
The Vector Grounding Problem
Dimitri Coelho Mollo
Raphael Milliere
23
26
0
04 Apr 2023
Eight Things to Know about Large Language Models
Sam Bowman
ALM
15
110
0
02 Apr 2023
Querying Large Language Models with SQL
Mohammed Saeed
Nicola De Cao
Paolo Papotti
14
28
0
02 Apr 2023
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
J. Holmes
Zheng Liu
Lian-Cheng Zhang
Yuzhen Ding
Terence T. Sio
...
Jonathan B. Ashman
Xiang Li
Tianming Liu
Jiajian Shen
W. Liu
LM&MA
AI4CE
ELM
28
120
0
01 Apr 2023
Previous
1
2
3
...
33
34
35
36
Next