COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

3 June 2023

Xuhui Zhou

Papers citing "COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements"

28 / 28 papers shown

Title
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages Priyanshu Kumar Devansh Jain Akhila Yerukola Liwei Jiang Himanshu Beniwal Thomas Hartvigsen Maarten Sap 52 0 0 06 Apr 2025
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English Runtao Zhou Guangya Wan Saadia Gabriel Sheng R. Li Alexander J Gates Maarten Sap Thomas Hartvigsen LRM 57 0 0 06 Mar 2025
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection Maximilian Spliethover Tim Knebler Fabian Fumagalli Maximilian Muschalik Barbara Hammer Eyke Hüllermeier Henning Wachsmuth 97 1 0 10 Feb 2025
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events Aditya Chinchure Sahithya Ravi R. Ng Vered Shwartz Boyang Albert Li Leonid Sigal ReLM LRM VLM 77 2 0 07 Dec 2024
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues Shilin Qu Weiqing Wang Xin Zhou Haolan Zhan Zhuang Li Lizhen Qu Linhao Luo Yuan-Fang Li Gholamreza Haffari 23 0 0 04 Oct 2024
Incorporating Procedural Fairness in Flag Submissions on Social Media Platforms Yunhee Shim Shagun Jhaver 29 0 0 13 Sep 2024
Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations Ritam Dutt Zhen Wu Kelly Shi Divyanshu Sheth Prakhar Gupta Carolyn Rose 24 2 0 27 Jun 2024
Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles Julia Kruk Michela Marchini Rijul Magu Caleb Ziems D. Muchlinski Diyi Yang 30 1 0 10 Jun 2024
Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech Neemesh Yadav Sarah Masud Vikram Goyal Vikram Goyal Md. Shad Akhtar Tanmoy Chakraborty 21 3 0 06 Jun 2024
Promoting Constructive Deliberation: Reframing for Receptiveness Gauri Kambhatla Matthew Lease Ashwin Rajadesingan 33 2 0 23 May 2024
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias Rebecca Dorn Lee Kezar Fred Morstatter Kristina Lerman 27 7 0 23 May 2024
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs Akhila Yerukola Saujas Vaduguru Daniel Fried Maarten Sap 29 1 0 14 May 2024
The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages Wondimagegnhue Tufa Ilia Markov Piek Vossen 11 0 0 29 Apr 2024
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future Minzhi Li Weiyan Shi Caleb Ziems Diyi Yang 28 8 0 28 Feb 2024
COBIAS: Contextual Reliability in Bias Assessment Priyanshul Govil Hemang Jain Vamshi Bonagiri Aman Chadha Ponnurangam Kumaraguru Manas Gaur S. Dey 38 2 0 22 Feb 2024
Understanding News Creation Intents: Frame, Dataset, and Method Zhengjia Wang Danding Wang Qiang Sheng Juan Cao Silong Su Yifan Sun Beizhe Hu Siyuan Ma 10 4 0 27 Dec 2023
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory Niloofar Mireshghallah Hyunwoo J. Kim Xuhui Zhou Yulia Tsvetkov Maarten Sap Reza Shokri Yejin Choi PILM 22 73 0 27 Oct 2023
Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning Ananth Balashankar Xiao Ma Aradhana Sinha Ahmad Beirami Yao Qin Jilin Chen Alex Beutel 16 2 0 25 Oct 2023
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations Kavel Rao Liwei Jiang Valentina Pyatkin Yuling Gu Niket Tandon Nouha Dziri Faeze Brahman Yejin Choi 16 15 0 24 Oct 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties Taylor Sorensen Liwei Jiang Jena D. Hwang Sydney Levine Valentina Pyatkin ... Kavel Rao Chandra Bhagavatula Maarten Sap J. Tasioulas Yejin Choi SLR 11 50 0 02 Sep 2023
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models Julia Mendelsohn Ronan Le Bras Yejin Choi Maarten Sap 13 25 0 26 May 2023
Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting Akhila Yerukola Xuhui Zhou Elizabeth Clark Maarten Sap 12 6 0 24 May 2023
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases Yiming Zhang Sravani Nanduri Liwei Jiang Tongshuang Wu Maarten Sap 22 7 0 23 May 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022
Can Machines Learn Morality? The Delphi Experiment Liwei Jiang Jena D. Hwang Chandra Bhagavatula Ronan Le Bras Jenny T Liang ... Yulia Tsvetkov Oren Etzioni Maarten Sap Regina A. Rini Yejin Choi FaML 117 110 0 14 Oct 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech Mai Elsherief Caleb Ziems D. Muchlinski Vaishnavi Anupindi Jordyn Seybolt M. D. Choudhury Diyi Yang 92 235 0 11 Sep 2021
Measuring Association Between Labels and Free-Text Rationales Sarah Wiegreffe Ana Marasović Noah A. Smith 274 170 0 24 Oct 2020
Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets Chuanrong Li Lin Shengshuo Leo Z. Liu Xinyi Wu Xuhui Zhou Shane Steinert-Threlkeld VLM 128 38 0 16 Oct 2020