Second Thoughts are Best: Learning to Re-Align With Human Values from
Text Edits

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

1 January 2023

Ruibo Liu

Ge Zhang

Soroush Vosoughi

Papers citing "Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits"

13 / 13 papers shown

Title
Evaluating and Aligning Human Economic Risk Preferences in LLMs J. Liu Yi Yang K. Tam 50 0 0 09 Mar 2025
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits Tuhin Chakrabarty Philippe Laban C. Wu 41 8 0 22 Sep 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models Xinpeng Wang Shitong Duan Xiaoyuan Yi Jing Yao Shanlin Zhou Zhihua Wei Peng Zhang Dongkuan Xu Maosong Sun Xing Xie OffRL 27 16 0 07 Mar 2024
COPR: Continual Human Preference Learning via Optimal Policy Regularization Han Zhang Lin Gui Yu Lei Yuanzhao Zhai Yehong Zhang ... Hui Wang Yue Yu Kam-Fai Wong Bin Liang Ruifeng Xu CLL 23 4 0 22 Feb 2024
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments Nailia Mirzakhmedova Johannes Kiesel Milad Alshomary Maximilian Heinrich Nicolas Handke ... Mohammad Ali Sadraei Ehsaneddin Asgari Lea Kawaletz Henning Wachsmuth Benno Stein 12 38 0 31 Jan 2023
Learning to Model Editing Processes Machel Reid Graham Neubig KELM BDL 101 34 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 203 1,651 0 15 Oct 2021
Unsolved Problems in ML Safety Dan Hendrycks Nicholas Carlini John Schulman Jacob Steinhardt 156 268 0 28 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 251 374 0 28 Feb 2021
Text Editing by Command Felix Faltings Michel Galley Gerold Hintz Chris Brockett Chris Quirk Jianfeng Gao Bill Dolan KELM 126 33 0 24 Oct 2020
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 273 1,561 0 18 Sep 2019