Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2212.08061
Cited By
v1
v2 (latest)
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
15 December 2022
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github
Papers citing
"On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning"
50 / 159 papers shown
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
IACR Cryptology ePrint Archive (IACR ePrint), 2025
Fatima Jahara
Mark Dredze
Sharon Levy
LRM
110
0
0
08 Nov 2025
Chain-of-Thought Hijacking
Jianli Zhao
Tingchen Fu
Rylan Schaeffer
Mrinank Sharma
Fazl Barez
LRM
230
4
0
30 Oct 2025
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
Guoqing Luo
Iffat Maab
Lili Mou
Junichi Yamagishi
LRM
227
2
0
20 Oct 2025
Community size rather than grammatical complexity better predicts Large Language Model accuracy in a novel Wug Test
Nikoleta Pantelidou
Evelina Leivada
Paolo Morosi
Paolo Morosi
ELM
154
1
0
14 Oct 2025
Debiasing LLMs by Masking Unfairness-Driving Attention Heads
Tingxu Han
Wei Song
Ziqi Ding
Z. Li
Chunrong Fang
Yuekang Li
Dongfang Liu
Zhenyu Chen
Zhenting Wang
293
0
0
11 Oct 2025
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Ragib Amin Nihal
Rui Wen
Kazuhiro Nakadai
Jun Sakuma
178
1
0
09 Oct 2025
AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming
Muxi Diao
Yutao Mou
Keqing He
Hanbo Song
Lulu Zhao
Shikun Zhang
Wei Ye
Kongming Liang
Zhanyu Ma
AAML
205
0
0
09 Oct 2025
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
Zhengyu Wu
Yinlin Zhu
Xunkai Li
Ziang Qiu
Rong-Hua Li
Guoren Wang
Chenghu Zhou
FedML
189
1
0
09 Oct 2025
Accelerating Diffusion LLM Inference via Local Determinism Propagation
Fanheng Kong
Jingyuan Zhang
Yahui Liu
Zirui Wu
Yu Tian
Victoria A. Webster-Wood
Guorui Zhou
AI4CE
188
0
0
08 Oct 2025
Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
Yunfan Zhang
Kathleen McKeown
Smaranda Muresan
LRM
181
0
0
05 Oct 2025
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Ruohao Guo
Afshin Oroojlooy
Roshan Sridhar
Miguel Ballesteros
Alan Ritter
Dan Roth
AAML
202
1
0
02 Oct 2025
Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
Expert systems with applications (ESWA), 2025
Smita Khapre
Melkamu Mersha
Hassan Shakil
Jonali Baruah
Jugal Kalita
216
4
0
29 Sep 2025
PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
Remote Sensing (RS), 2025
Hieu Tran
Zonghai Yao
Nguyen Luong Tran
Zhichao Yang
Feiyun Ouyang
Shuo Han
Razieh Rahimi
Hong-ye Yu
LLMAG
LRM
301
1
0
26 Sep 2025
Evaluating Large Language Models for Detecting Antisemitism
Jay Patel
Hrudayangam Mehta
Jeremy Blackburn
342
1
0
22 Sep 2025
Steering MoE LLMs via Expert (De)Activation
Mohsen Fayyaz
Ali Modarressi
Hanieh Deilamsalehy
Franck Dernoncourt
Ryan Rossi
Trung Bui
Hinrich Schutze
Nanyun Peng
MoE
LLMSV
268
8
0
11 Sep 2025
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng
Richard Fan
Shibo Hao
Taylor W. Killian
Haonan Li
...
Xuezhe Ma
Guowei He
Zhiting Hu
Zhengzhong Liu
Eric P. Xing
ReLM
OffRL
ALM
LRM
359
7
0
09 Sep 2025
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Ruicheng Xian
Yuxuan Wan
Han Zhao
FaML
215
0
0
15 Aug 2025
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Toluwani Aremu
Noor Hussein
Munachiso Nwadike
Samuele Poppi
Jie Zhang
Karthik Nandakumar
Neil Gong
Nils Lukas
399
0
0
10 Jul 2025
Argument-Based Consistency in Toxicity Explanations of LLMs
Ramaravind Kommiya Mothilal
Joanna Roy
Syed Ishtiaque Ahmed
Shion Guha
229
0
0
23 Jun 2025
Data Shifts Hurt CoT: A Theoretical Study
Lang Yin
Debangshu Banerjee
Gagandeep Singh
333
3
0
12 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MA
AI4CE
399
6
0
05 Jun 2025
Unified Game Moderation: Soft-Prompting and LLM-Assisted Label Transfer for Resource-Efficient Toxicity Detection
Zachary Yang
Domenico Tullo
Reihaneh Rabbany
156
3
0
01 Jun 2025
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Yixin Cui
Haotian Lin
Shuo Yang
Yixiao Wang
Yanjun Huang
Hong Chen
LM&Ro
LRM
ELM
457
7
0
26 May 2025
A Survey on Stereotype Detection in Natural Language Processing
ACM Computing Surveys (ACM Comput. Surv.), 2025
Alessandra Teresa Cignarella
Anastasia Giachanou
Els Lefever
281
0
0
23 May 2025
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Jiawei Du
Jinlong Wu
Yuzheng Chen
Yucheng Hu
Bing Li
Joey Tianyi Zhou
620
2
0
23 May 2025
HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning
Xingyu Tan
Xiaoyang Wang
Qing Liu
Xiwei Xu
Xin Yuan
Liming Zhu
Wenjie Zhang
RALM
LRM
512
10
0
23 May 2025
Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Franziska Sofia Hafner
Ana Valdivia
Luc Rocher
212
1
0
20 May 2025
ELEPHANT: Measuring and understanding social sycophancy in LLMs
Myra Cheng
Sunny Yu
Cinoo Lee
Pranav Khadpe
Lujain Ibrahim
Dan Jurafsky
388
16
0
20 May 2025
On the Thinking-Language Modeling Gap in Large Language Models
Chenxi Liu
Yongqiang Chen
Tongliang Liu
James Cheng
Bo Han
Kun Zhang
LRM
AI4CE
402
1
0
19 May 2025
BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Taolin Zhang
Dongyang Li
Qizhou Chen
Chengyu Wang
Xiaofeng He
419
5
0
17 May 2025
Unified attacks to large language model watermarks: spoofing and scrubbing in unauthorized knowledge distillation
Knowledge-Based Systems (KBS), 2025
Xin Yi
Shunfan Zhengc
Linlin Wanga
Xiaoling Wang
Xiaoling Wang
Liang He
AAML
1.3K
3
0
24 Apr 2025
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Quy-Anh Dang
Chris Ngo
Truong-Son Hy
AAML
SyDa
338
4
0
21 Apr 2025
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Myrthe Reuver
Indira Sen
Matteo Melis
Gabriella Lapesa
222
2
0
21 Apr 2025
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
508
7
0
08 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
424
3
0
07 Apr 2025
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Zhongfu Chen
Zhaochun Ren
Maarten de Rijke
859
1
0
05 Apr 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAML
ALM
ELM
294
9
0
25 Mar 2025
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Suyoung Bae
YunSeok Choi
Jee-Hyong Lee
315
0
0
25 Mar 2025
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Siyang Song
Xinpeng Wang
Guangyao Zhai
Nassir Navab
Yun Xue
LLMAG
270
6
0
22 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Panatchakorn Anantaprayoon
Masahiro Kaneko
Naoaki Okazaki
LRM
KELM
437
5
0
08 Mar 2025
Implicit Bias in LLMs: A Survey
Xinru Lin
Luyang Li
450
14
0
04 Mar 2025
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
1.1K
14
0
04 Mar 2025
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jungsoo Park
Junmo Kang
Gabriel Stanovsky
Alan Ritter
463
0
0
26 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMSV
509
23
0
18 Feb 2025
Security Attacks on LLM-based Code Completion Tools
AAAI Conference on Artificial Intelligence (AAAI), 2024
Wen Cheng
Ke Sun
Xinyu Zhang
Wei Wang
SILM
AAML
ELM
368
0
0
03 Jan 2025
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Haoyang Li
Xudong Han
Zenan Zhai
Honglin Mu
Hao Wang
...
Eduard H. Hovy
Iryna Gurevych
Preslav Nakov
Monojit Choudhury
Timothy Baldwin
ALM
257
4
0
24 Dec 2024
The Limits of Inference Scaling Through Resampling
Benedikt Stroebl
Sayash Kapoor
Arvind Narayanan
LRM
606
45
0
26 Nov 2024
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
Neural Information Processing Systems (NeurIPS), 2024
Yuxin Xiao
Chaoqun Wan
Yonggang Zhang
Wenxiao Wang
Binbin Lin
Xiaofei He
Xu Shen
Jieping Ye
269
5
0
04 Nov 2024
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Ryan Liu
Jiayi Geng
Addison J. Wu
Ilia Sucholutsky
Tania Lombrozo
Thomas Griffiths
ReLM
LRM
721
107
0
27 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
425
6
0
25 Oct 2024
1
2
3
4
Next
Page 1 of 4