Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.00355
Cited By
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
1 January 2023
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits"
13 / 13 papers shown
Title
Evaluating and Aligning Human Economic Risk Preferences in LLMs
J. Liu
Yi Yang
K. Tam
50
0
0
09 Mar 2025
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
Tuhin Chakrabarty
Philippe Laban
C. Wu
41
8
0
22 Sep 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
27
16
0
07 Mar 2024
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
23
4
0
22 Feb 2024
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
Nailia Mirzakhmedova
Johannes Kiesel
Milad Alshomary
Maximilian Heinrich
Nicolas Handke
...
Mohammad Ali Sadraei
Ehsaneddin Asgari
Lea Kawaletz
Henning Wachsmuth
Benno Stein
12
38
0
31 Jan 2023
Learning to Model Editing Processes
Machel Reid
Graham Neubig
KELM
BDL
101
34
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
156
268
0
28 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
251
374
0
28 Feb 2021
Text Editing by Command
Felix Faltings
Michel Galley
Gerold Hintz
Chris Brockett
Chris Quirk
Jianfeng Gao
Bill Dolan
KELM
126
33
0
24 Oct 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1