Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.07459
Cited By
The Capacity for Moral Self-Correction in Large Language Models
15 February 2023
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
Anna Chen
Anna Goldie
Azalia Mirhoseini
Catherine Olsson
Danny Hernandez
Dawn Drain
Dustin Li
Eli Tran-Johnson
Ethan Perez
John Kernion
Jamie Kerr
J. Mueller
J. Landau
Kamal Ndousse
Karina Nguyen
Liane Lovitt
Michael Sellitto
Nelson Elhage
Noemí Mercado
Nova Dassarma
Oliver Rausch
R. Lasenby
Robin Larson
Sam Ringer
Sandipan Kundu
Saurav Kadavath
Scott Johnston
Shauna Kravec
S. E. Showk
Tamera Lanham
Timothy Telleen-Lawton
T. Henighan
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Capacity for Moral Self-Correction in Large Language Models"
25 / 25 papers shown
Title
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
82
0
0
25 Apr 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
75
41
0
20 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Qingyang Wu
Ying Xu
Tingsong Xiao
Yunze Xiao
Yitong Li
...
Yichi Zhang
Shanghai Zhong
Yuwei Zhang
Wei Lu
Yifan Yang
70
1
0
17 Jan 2025
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
Yanhong Li
Chenghao Yang
Allyson Ettinger
ReLM
LRM
LLMAG
26
6
0
14 Apr 2024
The Impact of Unstated Norms in Bias Analysis of Language Models
Farnaz Kohankhaki
D. B. Emerson
David B. Emerson
Laleh Seyyed-Kalantari
Faiza Khan Khattak
45
1
0
04 Apr 2024
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Boshi Wang
Hao Fang
Jason Eisner
Benjamin Van Durme
Yu-Chuan Su
CLL
24
7
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
33
16
0
07 Mar 2024
Do Large Language Models Mirror Cognitive Language Processing?
Yuqi Ren
Renren Jin
Tongxuan Zhang
Deyi Xiong
31
4
0
28 Feb 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
Timothy Baldwin
LRM
17
27
0
28 Jan 2024
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
G. Yona
Roee Aharoni
Mor Geva
ELM
31
11
0
09 Jan 2024
Evaluating and Mitigating Discrimination in Language Model Decisions
Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
32
66
0
06 Dec 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
49
20
0
28 Nov 2023
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang
Yan Teng
Kexin Huang
Chengqi Lyu
Songyang Zhang
Wenwei Zhang
Xingjun Ma
Yu-Gang Jiang
Yu Qiao
Yingchun Wang
20
14
0
10 Nov 2023
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
34
150
0
05 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
70
10,890
0
18 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Kaser
Antoine Bosselut
17
3
0
01 Jul 2023
Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses
Eliza Kosoy
Emily Rose Reagan
Leslie Y. Lai
Alison Gopnik
Danielle Krettek Cobb
15
9
0
18 May 2023
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
17
336
0
30 Mar 2023
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
210
364
0
15 Oct 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
257
374
0
28 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
223
4,424
0
23 Jan 2020
1