Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13589
Cited By
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases
23 May 2023
Yiming Zhang
Sravani Nanduri
Liwei Jiang
Tongshuang Wu
Maarten Sap
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases"
8 / 8 papers shown
Title
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
61
0
0
30 Apr 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations
David Hartmann
Amin Oueslati
Dimitri Staufer
Lena Pohlmann
Simon Munzert
Hendrik Heuer
48
0
0
03 Mar 2025
SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior
Jing-Jing Li
Valentina Pyatkin
Max Kleiman-Weiner
Liwei Jiang
Nouha Dziri
Anne Collins
Jana Schaich Borg
Maarten Sap
Yejin Choi
Sydney Levine
19
1
0
22 Oct 2024
Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech
Neemesh Yadav
Sarah Masud
Vikram Goyal
Vikram Goyal
Md. Shad Akhtar
Tanmoy Chakraborty
21
3
0
06 Jun 2024
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
Shaona Ghosh
Prasoon Varshney
Erick Galinkin
Christopher Parisien
ELM
33
35
0
09 Apr 2024
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,217
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
255
620
0
04 Dec 2018
1