Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.03463
Cited By
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
7 September 2022
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots"
37 / 37 papers shown
Title
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
26
0
0
13 May 2025
The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships
Renwen Zhang
Han Li
Han Meng
Jinyuan Zhan
Hongyuan Gan
Yi-Chieh Lee
19
5
0
26 Oct 2024
Vision Language Models Can Parse Floor Plan Maps
David DeFazio
Hrudayangam Mehta
Jeremy Blackburn
Shiqi Zhang
CoGe
18
0
0
19 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
37
4
0
01 Sep 2024
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Kunsheng Tang
Wenbo Zhou
Jie Zhang
Aishan Liu
Gelei Deng
Shuai Li
Peigui Qi
Weiming Zhang
Tianwei Zhang
Nenghai Yu
37
3
0
22 Aug 2024
Efficient Detection of Toxic Prompts in Large Language Models
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
29
4
0
21 Aug 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
29
6
0
26 Jun 2024
A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity
Jiayang Li
Jiale Li
37
7
0
06 Apr 2024
SOTOPIA-
π
π
π
: Interactive Learning of Socially Intelligent Language Agents
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
33
37
0
13 Mar 2024
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILM
AAML
27
28
0
20 Feb 2024
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Gelei Deng
Yi Liu
Kailong Wang
Yuekang Li
Tianwei Zhang
Yang Liu
13
41
0
13 Feb 2024
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation
Bangyan He
Xiaojun Jia
Siyuan Liang
Tianrui Lou
Yang Liu
Xiaochun Cao
AAML
VLM
19
23
0
08 Dec 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
29
9
0
09 Nov 2023
Comprehensive Assessment of Toxicity in ChatGPT
Boyang Zhang
Xinyue Shen
Waiman Si
Zeyang Sha
Z. Chen
Ahmed Salem
Yun Shen
Michael Backes
Yang Zhang
SILM
8
3
0
03 Nov 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan S. Kankanhalli
AAML
SILM
22
68
0
20 Oct 2023
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
19
22
0
16 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
12
169
0
03 Oct 2023
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions
Yufan Chen
Arjun Arunasalam
Z. Berkay Celik
24
33
0
03 Oct 2023
Bias and Fairness in Chatbots: An Overview
Jintang Xue
Yun Cheng Wang
Chengwei Wei
Xiaofeng Liu
Jonghye Woo
C.-C. Jay Kuo
29
25
0
16 Sep 2023
AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics
V. Ghafouri
Vibhor Agarwal
Yong Zhang
Nishanth R. Sastry
Jose Such
Guillermo Suarez-Tangil
AI4MH
10
21
0
28 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen
Z. Chen
Michael Backes
Yun Shen
Yang Zhang
SILM
33
243
0
07 Aug 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
33
118
0
16 Jul 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
19
15
0
14 Jul 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
12
9
0
20 Jun 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety
Lora Aroyo
Alex S. Taylor
Mark Díaz
Christopher Homan
Alicia Parrish
Greg Serapio-García
Vinodkumar Prabhakaran
Ding Wang
19
33
0
20 Jun 2023
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Tianwei Zhang
Yepang Liu
Haoyu Wang
Yanhong Zheng
Yang Liu
SILM
15
312
0
08 Jun 2023
BiasAsker: Measuring the Bias in Conversational AI System
Yuxuan Wan
Wenxuan Wang
Pinjia He
Jiazhen Gu
Haonan Bai
Michael Lyu
14
67
0
21 May 2023
Generating Phishing Attacks using ChatGPT
S. Roy
Krishna Vamsi Naragam
Shirin Nilizadeh
24
33
0
09 May 2023
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
15
3
0
18 Apr 2023
Talking Abortion (Mis)information with ChatGPT on TikTok
Filipo Sharevski
J. Loop
Peter Jachim
Amy Devine
Emma Pieroni
24
5
0
23 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao-Lun Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
21
15
0
18 Feb 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
12
99
0
30 Jan 2023
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
193
0
15 Sep 2021
"A Virus Has No Religion": Analyzing Islamophobia on Twitter During the COVID-19 Outbreak
Mohit Chandra
Manvith Reddy
Shradha Sehgal
Saurabh Gupta
Arun Balaji Buduru
Ponnurangam Kumaraguru
13
41
0
11 Jul 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
206
615
0
03 Sep 2019
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
294
4,187
0
23 Aug 2019
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
198
1,325
0
05 Jun 2016
1