Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.14517
Cited By
Watch Your Language: Investigating Content Moderation with Large Language Models
25 September 2023
Deepak Kumar
Y. AbuHashem
Zakir Durumeric
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Watch Your Language: Investigating Content Moderation with Large Language Models"
12 / 12 papers shown
Title
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun-Xiong Xia
Tianyi Wu
Zhiwei Xue
Y. Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
124
13
0
30 Jan 2025
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
38
21
0
26 Sep 2024
End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting
Leijie Wang
Kathryn Yurechko
Pranati Dani
Quan Ze Chen
Amy X. Zhang
40
1
0
05 Sep 2024
The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content
Xinyu Wang
S. Koneru
Pranav Narayanan Venkit
Brett Frischmann
Sarah Rajtmajer
21
0
0
17 May 2024
Corporate Communication Companion (CCC): An LLM-empowered Writing Assistant for Workplace Social Media
Zhuoran Lu
Sheshera Mysore
Tara Safavi
Jennifer Neville
Longqi Yang
Mengting Wan
25
7
0
07 May 2024
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
Shangbin Feng
Herun Wan
Ningnan Wang
Zhaoxuan Tan
Minnan Luo
Yulia Tsvetkov
AAML
DeLMO
14
14
0
01 Feb 2024
Perceptions of Moderators as a Large-Scale Measure of Online Community Governance
Galen Cassebeer Weld
Leon Leibmann
Amy X. Zhang
Tim Althoff
13
2
0
29 Jan 2024
APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT
Yiming Zhu
Zhizhuo Yin
Gareth Tyson
Ehsan-ul Haq
Lik-Hang Lee
Pan Hui
ALM
33
6
0
24 Jan 2024
Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online
Taeksoo Kwon
Connor Kim
10
1
0
29 Dec 2023
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Matan Halevy
Camille Harris
A. Bruckman
Diyi Yang
A. Howard
28
27
0
27 Sep 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
87
233
0
11 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
257
374
0
28 Feb 2021
1