ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08061
  4. Cited By
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in
  Zero-Shot Reasoning
v1v2 (latest)

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

Annual Meeting of the Association for Computational Linguistics (ACL), 2022
15 December 2022
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
    ReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github

Papers citing "On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning"

50 / 159 papers shown
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
Evaluating Implicit Biases in LLM Reasoning through Logic Grid PuzzlesIACR Cryptology ePrint Archive (IACR ePrint), 2025
Fatima Jahara
Mark Dredze
Sharon Levy
LRM
110
0
0
08 Nov 2025
Chain-of-Thought Hijacking
Chain-of-Thought Hijacking
Jianli Zhao
Tingchen Fu
Rylan Schaeffer
Mrinank Sharma
Fazl Barez
LRM
230
4
0
30 Oct 2025
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation
Guoqing Luo
Iffat Maab
Lili Mou
Junichi Yamagishi
LRM
227
2
0
20 Oct 2025
Community size rather than grammatical complexity better predicts Large Language Model accuracy in a novel Wug Test
Community size rather than grammatical complexity better predicts Large Language Model accuracy in a novel Wug Test
Nikoleta Pantelidou
Evelina Leivada
Paolo Morosi
Paolo Morosi
ELM
154
1
0
14 Oct 2025
Debiasing LLMs by Masking Unfairness-Driving Attention Heads
Debiasing LLMs by Masking Unfairness-Driving Attention Heads
Tingxu Han
Wei Song
Ziqi Ding
Z. Li
Chunrong Fang
Yuekang Li
Dongfang Liu
Zhenyu Chen
Zhenting Wang
293
0
0
11 Oct 2025
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Ragib Amin Nihal
Rui Wen
Kazuhiro Nakadai
Jun Sakuma
178
1
0
09 Oct 2025
AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming
AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming
Muxi Diao
Yutao Mou
Keqing He
Hanbo Song
Lulu Zhao
Shikun Zhang
Wei Ye
Kongming Liang
Zhanyu Ma
AAML
205
0
0
09 Oct 2025
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling
Zhengyu Wu
Yinlin Zhu
Xunkai Li
Ziang Qiu
Rong-Hua Li
Guoren Wang
Chenghu Zhou
FedML
189
1
0
09 Oct 2025
Accelerating Diffusion LLM Inference via Local Determinism Propagation
Accelerating Diffusion LLM Inference via Local Determinism Propagation
Fanheng Kong
Jingyuan Zhang
Yahui Liu
Zirui Wu
Yu Tian
Victoria A. Webster-Wood
Guorui Zhou
AI4CE
188
0
0
08 Oct 2025
Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
Yunfan Zhang
Kathleen McKeown
Smaranda Muresan
LRM
181
0
0
05 Oct 2025
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Ruohao Guo
Afshin Oroojlooy
Roshan Sridhar
Miguel Ballesteros
Alan Ritter
Dan Roth
AAML
202
1
0
02 Oct 2025
Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future DirectionsExpert systems with applications (ESWA), 2025
Smita Khapre
Melkamu Mersha
Hassan Shakil
Jonali Baruah
Jugal Kalita
216
4
0
29 Sep 2025
PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
PRIME: Planning and Retrieval-Integrated Memory for Enhanced ReasoningRemote Sensing (RS), 2025
Hieu Tran
Zonghai Yao
Nguyen Luong Tran
Zhichao Yang
Feiyun Ouyang
Shuo Han
Razieh Rahimi
Hong-ye Yu
LLMAGLRM
301
1
0
26 Sep 2025
Evaluating Large Language Models for Detecting Antisemitism
Evaluating Large Language Models for Detecting Antisemitism
Jay Patel
Hrudayangam Mehta
Jeremy Blackburn
342
1
0
22 Sep 2025
Steering MoE LLMs via Expert (De)Activation
Steering MoE LLMs via Expert (De)Activation
Mohsen Fayyaz
Ali Modarressi
Hanieh Deilamsalehy
Franck Dernoncourt
Ryan Rossi
Trung Bui
Hinrich Schutze
Nanyun Peng
MoELLMSV
268
8
0
11 Sep 2025
K2-Think: A Parameter-Efficient Reasoning System
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng
Richard Fan
Shibo Hao
Taylor W. Killian
Haonan Li
...
Xuezhe Ma
Guowei He
Zhiting Hu
Zhengzhong Liu
Eric P. Xing
ReLMOffRLALMLRM
359
7
0
09 Sep 2025
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Ruicheng Xian
Yuxuan Wan
Han Zhao
FaML
215
0
0
15 Aug 2025
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Toluwani Aremu
Noor Hussein
Munachiso Nwadike
Samuele Poppi
Jie Zhang
Karthik Nandakumar
Neil Gong
Nils Lukas
399
0
0
10 Jul 2025
Argument-Based Consistency in Toxicity Explanations of LLMs
Argument-Based Consistency in Toxicity Explanations of LLMs
Ramaravind Kommiya Mothilal
Joanna Roy
Syed Ishtiaque Ahmed
Shion Guha
229
0
0
23 Jun 2025
Data Shifts Hurt CoT: A Theoretical Study
Data Shifts Hurt CoT: A Theoretical Study
Lang Yin
Debangshu Banerjee
Gagandeep Singh
333
3
0
12 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MAAI4CE
399
6
0
05 Jun 2025
Unified Game Moderation: Soft-Prompting and LLM-Assisted Label Transfer for Resource-Efficient Toxicity Detection
Unified Game Moderation: Soft-Prompting and LLM-Assisted Label Transfer for Resource-Efficient Toxicity Detection
Zachary Yang
Domenico Tullo
Reihaneh Rabbany
156
3
0
01 Jun 2025
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Yixin Cui
Haotian Lin
Shuo Yang
Yixiao Wang
Yanjun Huang
Hong Chen
LM&RoLRMELM
457
7
0
26 May 2025
A Survey on Stereotype Detection in Natural Language Processing
A Survey on Stereotype Detection in Natural Language ProcessingACM Computing Surveys (ACM Comput. Surv.), 2025
Alessandra Teresa Cignarella
Anastasia Giachanou
Els Lefever
281
0
0
23 May 2025
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Jiawei Du
Jinlong Wu
Yuzheng Chen
Yucheng Hu
Bing Li
Joey Tianyi Zhou
620
2
0
23 May 2025
HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning
HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning
Xingyu Tan
Xiaoyang Wang
Qing Liu
Xiwei Xu
Xin Yuan
Liming Zhu
Wenjie Zhang
RALMLRM
512
10
0
23 May 2025
Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity Theory
Gender Trouble in Language Models: An Empirical Audit Guided by Gender Performativity TheoryConference on Fairness, Accountability and Transparency (FAccT), 2025
Franziska Sofia Hafner
Ana Valdivia
Luc Rocher
212
1
0
20 May 2025
ELEPHANT: Measuring and understanding social sycophancy in LLMs
ELEPHANT: Measuring and understanding social sycophancy in LLMs
Myra Cheng
Sunny Yu
Cinoo Lee
Pranav Khadpe
Lujain Ibrahim
Dan Jurafsky
388
16
0
20 May 2025
On the Thinking-Language Modeling Gap in Large Language Models
On the Thinking-Language Modeling Gap in Large Language Models
Chenxi Liu
Yongqiang Chen
Tongliang Liu
James Cheng
Bo Han
Kun Zhang
LRMAI4CE
402
1
0
19 May 2025
BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering
BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Taolin Zhang
Dongyang Li
Qizhou Chen
Chengyu Wang
Xiaofeng He
419
5
0
17 May 2025
Unified attacks to large language model watermarks: spoofing and scrubbing in unauthorized knowledge distillation
Unified attacks to large language model watermarks: spoofing and scrubbing in unauthorized knowledge distillationKnowledge-Based Systems (KBS), 2025
Xin Yi
Shunfan Zhengc
Linlin Wanga
Xiaoling Wang
Xiaoling Wang
Liang He
AAML
1.3K
3
0
24 Apr 2025
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
Quy-Anh Dang
Chris Ngo
Truong-Son Hy
AAMLSyDa
338
4
0
21 Apr 2025
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism DetectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Myrthe Reuver
Indira Sen
Matteo Melis
Gabriella Lapesa
222
2
0
21 Apr 2025
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
508
7
0
08 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
424
3
0
07 Apr 2025
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Zhongfu Chen
Zhaochun Ren
Maarten de Rijke
859
1
0
05 Apr 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAMLALMELM
294
9
0
25 Mar 2025
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Suyoung Bae
YunSeok Choi
Jee-Hyong Lee
315
0
0
25 Mar 2025
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Siyang Song
Xinpeng Wang
Guangyao Zhai
Nassir Navab
Yun Xue
LLMAG
270
6
0
22 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Panatchakorn Anantaprayoon
Masahiro Kaneko
Naoaki Okazaki
LRMKELM
437
5
0
08 Mar 2025
Implicit Bias in LLMs: A Survey
Implicit Bias in LLMs: A Survey
Xinru Lin
Luyang Li
450
14
0
04 Mar 2025
LLM-Safety Evaluations Lack Robustness
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALMELM
1.1K
14
0
04 Mar 2025
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jungsoo Park
Junmo Kang
Gabriel Stanovsky
Alan Ritter
463
0
0
26 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Multi-Attribute Steering of Language Models via Targeted InterventionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMSV
509
23
0
18 Feb 2025
Security Attacks on LLM-based Code Completion Tools
Security Attacks on LLM-based Code Completion ToolsAAAI Conference on Artificial Intelligence (AAAI), 2024
Wen Cheng
Ke Sun
Xinyu Zhang
Wei Wang
SILMAAMLELM
368
0
0
03 Jan 2025
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard
  of Safety and Capability
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Haoyang Li
Xudong Han
Zenan Zhai
Honglin Mu
Hao Wang
...
Eduard H. Hovy
Iryna Gurevych
Preslav Nakov
Monojit Choudhury
Timothy Baldwin
ALM
257
4
0
24 Dec 2024
The Limits of Inference Scaling Through Resampling
The Limits of Inference Scaling Through Resampling
Benedikt Stroebl
Sayash Kapoor
Arvind Narayanan
LRM
606
45
0
26 Nov 2024
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse
  Activation Control
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation ControlNeural Information Processing Systems (NeurIPS), 2024
Yuxin Xiao
Chaoqun Wan
Yonggang Zhang
Wenxiao Wang
Binbin Lin
Xiaofei He
Xu Shen
Jieping Ye
269
5
0
04 Nov 2024
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
Ryan Liu
Jiayi Geng
Addison J. Wu
Ilia Sucholutsky
Tania Lombrozo
Thomas Griffiths
ReLMLRM
721
107
0
27 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
425
6
0
25 Oct 2024
1234
Next
Page 1 of 4