Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02378
Cited By
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
4 June 2024
Guangliang Liu
Haitao Mao
Bochuan Cao
Zhiyu Xue
K. Johnson
Jiliang Tang
Rongrong Wang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept"
11 / 11 papers shown
Title
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
23
0
0
30 Oct 2024
Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models
Loka Li
Zhenhao Chen
Guan-Hong Chen
Yixuan Zhang
Yusheng Su
Eric P. Xing
Kun Zhang
LRM
36
15
0
19 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
79
0
07 Feb 2024
Agent AI: Surveying the Horizons of Multimodal Interaction
Zane Durante
Qiuyuan Huang
Naoki Wake
Ran Gong
J. Park
...
Yejin Choi
Katsushi Ikeuchi
Hoi Vo
Fei-Fei Li
Jianfeng Gao
LM&Ro
92
72
0
07 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
64
95
0
03 Jan 2024
Out-of-Distribution Detection and Selective Generation for Conditional Language Models
Jie Jessie Ren
Jiaming Luo
Yao-Min Zhao
Kundan Krishna
Mohammad Saleh
Balaji Lakshminarayanan
Peter J. Liu
OODD
64
94
0
30 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
212
364
0
15 Oct 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
257
374
0
28 Feb 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
402
0
24 Feb 2021
1