Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.02626
Cited By
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
4 May 2023
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
""Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process"
12 / 12 papers shown
Title
Spontaneous Reward Hacking in Iterative Self-Refinement
Jane Pan
He He
Samuel R. Bowman
Shi Feng
34
10
0
05 Jul 2024
An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide
Oluwole Fagbohun
Rachel M. Harrison
Anton Dereventsov
49
6
0
18 Feb 2024
Enhancing Illicit Activity Detection using XAI: A Multimodal Graph-LLM Framework
Jack Nicholls
Aditya Kuppa
Nhien-An Le-Khac
38
4
0
20 Oct 2023
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models
Jose Berengueres
Marybeth Sandell
27
0
0
06 Jun 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate
Boshi Wang
Xiang Yue
Huan Sun
ELM
LRM
46
60
0
22 May 2023
Causality-Aided Trade-off Analysis for Machine Learning Fairness
Zhenlan Ji
Pingchuan Ma
Shuai Wang
Yanhui Li
FaML
31
7
0
22 May 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
127
111
0
14 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
256
1,996
0
31 Dec 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1