Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.05664
Cited By
v1
v2
v3 (latest)
Ruddit: Norms of Offensiveness for English Reddit Comments
10 June 2021
Rishav Hada
S. Sudhir
Pushkar Mishra
H. Yannakoudakis
Saif M. Mohammad
Ekaterina Shutova
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Ruddit: Norms of Offensiveness for English Reddit Comments"
23 / 23 papers shown
Title
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety
Taegyeong Lee
Jeonghwa Yoo
Hyoungseo Cho
Soo Yong Kim
Yunho Maeng
AAML
24
0
0
14 Jun 2025
Evaluating how LLM annotations represent diverse views on contentious topics
Megan A. Brown
Shubham Atreja
Libby Hemphill
Patrick Y. Wu
431
0
0
29 Mar 2025
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
Seanie Lee
Dong Bok Lee
Dominik Wagner
Minki Kang
Haebin Seong
Tobias Bocklet
Juho Lee
Sung Ju Hwang
134
2
0
18 Feb 2025
Which Demographics do LLMs Default to During Annotation?
Johannes Schäfer
Aidan Combs
Christopher Bagdon
Jiahui Li
Nadine Probol
...
Yarik Menchaca Resendiz
Aswathy Velutharambath
Amelie Wuhrl
Sabine Weber
Roman Klinger
78
2
0
11 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
231
6
0
02 Oct 2024
End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting
Leijie Wang
Kathryn Yurechko
Pranati Dani
Quan Ze Chen
Amy X. Zhang
84
3
0
05 Sep 2024
PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts
Ana-Cristina Rogoz
Maria Ilinca Nechita
Radu Tudor Ionescu
116
0
0
05 Jul 2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han
Kavel Rao
Allyson Ettinger
Liwei Jiang
Bill Yuchen Lin
Nathan Lambert
Yejin Choi
Nouha Dziri
128
101
0
26 Jun 2024
The Unseen Targets of Hate -- A Systematic Review of Hateful Communication Datasets
Zehui Yu
Indira Sen
Dennis Assenmacher
Mattia Samory
Leon Fröhling
Christina Dahn
Debora Nozza
Claudia Wagner
79
7
0
14 May 2024
"You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling
Christopher Bagdon
P. Karmalkar
Harsha Gurulingappa
Roman Klinger
58
1
0
26 Mar 2024
Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior Detection
Albert Lu
Stephen Cranefield
40
0
0
24 Mar 2024
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
Yueqi Xie
Minghong Fang
Renjie Pi
Neil Zhenqiang Gong
117
36
0
21 Feb 2024
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
...
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
AI4MH
174
466
0
07 Dec 2023
Style Locality for Controllable Generation with kNN Language Models
Gilles Nawezi
Lucie Flek
Charles F Welch
RALM
52
0
0
01 Nov 2023
''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text
Rishav Hada
Agrima Seth
Harshita Diddee
Kalika Bali
97
17
0
26 Oct 2023
A Benchmark for Understanding Dialogue Safety in Mental Health Support
Huachuan Qiu
Tong Zhao
Anqi Li
Shuai Zhang
Hongliang He
Zhenzhong Lan
78
10
0
31 Jul 2023
When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei
David Jurgens
76
34
0
12 Jun 2023
On the rise of fear speech in online social media
Punyajoy Saha
Kiran Garimella
Narla Komal Kalyan
Saurabh Kumar Pandey
Pauras Mangesh Meher
Binny Mathew
Animesh Mukherjee
41
23
0
18 Mar 2023
Which one is more toxic? Findings from Jigsaw Rate Severity of Toxic Comments
M. Das
Punyajoy Saha
Mithun Das
54
8
0
27 Jun 2022
Analyzing the Intensity of Complaints on Social Media
Ming Fang
Shi Zong
Jing Li
Xinyu Dai
Shujian Huang
Jiajun Chen
18
0
0
20 Apr 2022
CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection
Souvic Chakraborty
Parag Dutta
Sumegh Roychowdhury
Animesh Mukherjee
38
8
0
13 Apr 2022
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti
Maarten Sap
Alan Ritter
Mark O. Riedl
90
91
0
26 Aug 2021
WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments
Skye Morgan
Tharindu Ranasinghe
Marcos Zampieri
70
6
0
30 Jul 2021
1