Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.08061
Cited By
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
15 December 2022
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning"
24 / 24 papers shown
Title
Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation
Xin Yi
Shunfan Zhengc
Linlin Wanga
Xiaoling Wang
Liang He
Liang He
AAML
65
0
0
24 Apr 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Panatchakorn Anantaprayoon
Masahiro Kaneko
Naoaki Okazaki
LRM
KELM
50
0
0
08 Mar 2025
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
32
0
0
25 Oct 2024
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Ruimeng Ye
Yang Xiao
Bo Hui
ALM
ELM
OffRL
27
2
0
16 Oct 2024
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
74
11
0
16 Aug 2024
Quite Good, but Not Enough: Nationality Bias in Large Language Models -- A Case Study of ChatGPT
Shucheng Zhu
Weikang Wang
Ying Liu
21
5
0
11 May 2024
Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation
Juhwan Choi
Jungmin Yun
Kyohoon Jin
Youngbin Kim
27
3
0
15 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
53
30
0
08 Apr 2024
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
KELM
26
17
0
19 Jan 2024
From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models
Zachary Englhardt
Chengqian Ma
Margaret E. Morris
X. Xu
Chun-Cheng Chang
Lianhui Qin
Daniel J. McDuff
Xin Liu
Shwetak N. Patel
Vikram Iyer
AI4MH
30
11
0
21 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
11
25
0
15 Nov 2023
CMD: a framework for Context-aware Model self-Detoxification
Zecheng Tang
Keyan Zhou
Juntao Li
Yuyang Ding
Pinzheng Wang
Bowen Yan
Minzhang
MU
18
5
0
16 Aug 2023
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
19
827
0
05 Jul 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
165
320
0
06 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
197
160
0
05 Oct 2022
Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models
Ben Prystawski
P. Thibodeau
Christopher Potts
Noah D. Goodman
ReLM
LRM
AI4CE
32
20
0
16 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
210
364
0
15 Oct 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,898
0
31 Dec 2020
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
252
618
0
04 Dec 2018
1