v1v2 (latest)

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

Annual Meeting of the Association for Computational Linguistics (ACL), 2022

15 December 2022

Diyi Yang

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github

Papers citing "On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning"

50 / 159 papers shown

Evaluating Implicit Biases in LLM Reasoning through Logic Grid PuzzlesIACR Cryptology ePrint Archive (IACR ePrint), 2025

110

08 Nov 2025

Chain-of-Thought Hijacking

230

30 Oct 2025

Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation

227

20 Oct 2025

Community size rather than grammatical complexity better predicts Large Language Model accuracy in a novel Wug Test

154

14 Oct 2025

Debiasing LLMs by Masking Unfairness-Driving Attention Heads

293

11 Oct 2025

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

178

09 Oct 2025

AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

205

09 Oct 2025

FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling

189

09 Oct 2025

Accelerating Diffusion LLM Inference via Local Determinism Propagation

Victoria A. Webster-Wood

Guorui Zhou

AI4CE

188

08 Oct 2025

Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment

181

05 Oct 2025

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

202

02 Oct 2025

Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future DirectionsExpert systems with applications (ESWA), 2025

216

29 Sep 2025

PRIME: Planning and Retrieval-Integrated Memory for Enhanced ReasoningRemote Sensing (RS), 2025

301

26 Sep 2025

Evaluating Large Language Models for Detecting Antisemitism

Jay Patel

Hrudayangam Mehta

Jeremy Blackburn

342

22 Sep 2025

Steering MoE LLMs via Expert (De)Activation

268

11 Sep 2025

K2-Think: A Parameter-Efficient Reasoning System

...

359

09 Sep 2025

Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing

215

15 Aug 2025

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

399

10 Jul 2025

Argument-Based Consistency in Toxicity Explanations of LLMs

Ramaravind Kommiya Mothilal

Joanna Roy

Syed Ishtiaque Ahmed

Shion Guha

229

23 Jun 2025

Data Shifts Hurt CoT: A Theoretical Study

Lang Yin

Debangshu Banerjee

Gagandeep Singh

333

12 Jun 2025

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety

399

05 Jun 2025

Unified Game Moderation: Soft-Prompting and LLM-Assisted Label Transfer for Resource-Efficient Toxicity Detection

Zachary Yang

Domenico Tullo

Reihaneh Rabbany

156

01 Jun 2025

Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects

457

26 May 2025

A Survey on Stereotype Detection in Natural Language ProcessingACM Computing Surveys (ACM Comput. Surv.), 2025

Alessandra Teresa Cignarella

Anastasia Giachanou

Els Lefever

281

23 May 2025

Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution

620

23 May 2025

HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning

512

23 May 2025

Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity TheoryConference on Fairness, Accountability and Transparency (FAccT), 2025

Franziska Sofia Hafner

Ana Valdivia

Luc Rocher

212

20 May 2025

ELEPHANT: Measuring and understanding social sycophancy in LLMs

388

20 May 2025

On the Thinking-Language Modeling Gap in Large Language Models

402

19 May 2025

BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

419

17 May 2025

Unified attacks to large language model watermarks: spoofing and scrubbing in unauthorized knowledge distillationKnowledge-Based Systems (KBS), 2025

1.3K

24 Apr 2025

RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

338

21 Apr 2025

Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

222

21 Apr 2025

Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

508

08 Apr 2025

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions

Dang Nguyen

Chenhao Tan

424

07 Apr 2025

Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making

859

05 Apr 2025

FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

294

25 Mar 2025

DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Suyoung Bae

YunSeok Choi

Jee-Hyong Lee

315

25 Mar 2025

Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior

270

22 Mar 2025

Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models

Panatchakorn Anantaprayoon

Masahiro Kaneko

Naoaki Okazaki

LRM KELM

437

08 Mar 2025

Implicit Bias in LLMs: A Survey

Xinru Lin

Luyang Li

450

04 Mar 2025

LLM-Safety Evaluations Lack Robustness

1.1K

04 Mar 2025

Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

463

26 Feb 2025

Multi-Attribute Steering of Language Models via Targeted InterventionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

509

18 Feb 2025

Security Attacks on LLM-based Code Completion ToolsAAAI Conference on Artificial Intelligence (AAAI), 2024

368

03 Jan 2025

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

...

257

24 Dec 2024

The Limits of Inference Scaling Through Resampling

606

26 Nov 2024

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation ControlNeural Information Processing Systems (NeurIPS), 2024

269

04 Nov 2024

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

721

107

27 Oct 2024

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

425

25 Oct 2024