Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2209.03463
Cited By
v1
v2 (latest)
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Conference on Computer and Communications Security (CCS), 2022
7 September 2022
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots"
37 / 37 papers shown
Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
Expert systems with applications (ESWA), 2025
Smita Khapre
Melkamu Mersha
Hassan Shakil
Jonali Baruah
Jugal Kalita
216
4
0
29 Sep 2025
"Abuse Risks are Often Inherent to Product Features": Exploring AI Vendors' Bug Bounty and Responsible Disclosure Policies
Yangheran Piao
Jingjie Li
Daniel W. Woods
268
1
0
07 Sep 2025
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
Chi Zhang
Changjia Zhu
Junjie Xiong
Xiaoran Xu
Jinkui Chi
Yao Liu
Zhuo Lu
ELM
306
5
0
07 Aug 2025
LM-Scout: Analyzing the Security of Language Model Integration in Android Apps
Muhammad Ibrahim
Gűliz Seray Tuncay
Z. Berkay Celik
Aravind Machiry
Antonio Bianchi
373
1
0
13 May 2025
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
International Conference on Learning Representations (ICLR), 2025
Mingjie Li
Wai Man Si
Michael Backes
Yang Zhang
Yisen Wang
475
39
0
03 Jan 2025
The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships
International Conference on Human Factors in Computing Systems (CHI), 2024
Renwen Zhang
Han Li
Han Meng
Jinyuan Zhan
Hongyuan Gan
Yi-Chieh Lee
323
0
0
26 Oct 2024
Vision Language Models Can Parse Floor Plan Maps
David DeFazio
Hrudayangam Mehta
Meng Wang
Ping Yang
Jeremy Blackburn
Shiqi Zhang
CoGe
404
6
0
19 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
343
10
0
01 Sep 2024
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Conference on Computer and Communications Security (CCS), 2024
Kunsheng Tang
Wenbo Zhou
Jie Zhang
Aishan Liu
Gelei Deng
Shuai Li
Peigui Qi
Weiming Zhang
Tianwei Zhang
Nenghai Yu
500
12
0
22 Aug 2024
Efficient Detection of Toxic Prompts in Large Language Models
International Conference on Automated Software Engineering (ASE), 2024
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
543
18
0
21 Aug 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
324
14
0
26 Jun 2024
A Map of Exploring Human Interaction patterns with LLM: Insights into Collaboration and Creativity
Jiayang Li
Jiale Li
392
21
0
06 Apr 2024
SOTOPIA-
π
π
π
: Interactive Learning of Socially Intelligent Language Agents
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
469
78
0
13 Mar 2024
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILM
AAML
452
49
0
20 Feb 2024
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
Gelei Deng
Yi Liu
Kailong Wang
Yuekang Li
Tianwei Zhang
Yang Liu
268
73
0
13 Feb 2024
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation
Bangyan He
Yang Liu
Yaning Tan
Tianrui Lou
Yang Liu
Simeng Qin
AAML
VLM
389
39
0
08 Dec 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
316
18
0
09 Nov 2023
Comprehensive Assessment of Toxicity in ChatGPT
Boyang Zhang
Xinyue Shen
Waiman Si
Zeyang Sha
Sihao Lin
Ahmed Salem
Yun Shen
Michael Backes
Yang Zhang
SILM
334
6
0
03 Nov 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Haiyan Zhao
Jingfeng Zhang
Mohan Kankanhalli
AAML
SILM
294
142
0
20 Oct 2023
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
363
35
0
16 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
550
305
0
03 Oct 2023
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions
Asia-Pacific Computer Systems Architecture Conference (ACSA), 2023
Yufan Chen
Arjun Arunasalam
Z. Berkay Celik
226
62
0
03 Oct 2023
Bias and Fairness in Chatbots: An Overview
APSIPA Transactions on Signal and Information Processing (TASIP), 2023
Jintang Xue
Yun Cheng Wang
Chengwei Wei
Xiaofeng Liu
Jonghye Woo
C.-C. Jay Kuo
386
65
0
16 Sep 2023
AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics
International Conference on Information and Knowledge Management (CIKM), 2023
V. Ghafouri
Vibhor Agarwal
Yong Zhang
Nishanth R. Sastry
Jose Such
Guillermo Suarez-Tangil
AI4MH
315
29
0
28 Aug 2023
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Conference on Computer and Communications Security (CCS), 2023
Xinyue Shen
Sihao Lin
Michael Backes
Yun Shen
Yang Zhang
SILM
592
558
0
07 Aug 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Network and Distributed System Security Symposium (NDSS), 2023
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
485
216
0
16 Jul 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
International Symposium on Recent Advances in Intrusion Detection (RAID), 2023
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
264
25
0
14 Jul 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
267
11
0
20 Jun 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety
Neural Information Processing Systems (NeurIPS), 2023
Lora Aroyo
Alex S. Taylor
Mark Díaz
Christopher Homan
Alicia Parrish
Greg Serapio-García
Vinodkumar Prabhakaran
Ding Wang
312
75
0
20 Jun 2023
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Yepang Liu
Haoyu Wang
Yanhong Zheng
Leo Yu Zhang
Yang Liu
SILM
611
677
0
08 Jun 2023
BiasAsker: Measuring the Bias in Conversational AI System
Yuxuan Wan
Wenxuan Wang
Pinjia He
Jiazhen Gu
Haonan Bai
Michael Lyu
335
91
0
21 May 2023
Generating Phishing Attacks using ChatGPT
Sayak Saha Roy
Krishna Vamsi Naragam
Shirin Nilizadeh
278
42
0
09 May 2023
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
326
3
0
18 Apr 2023
Talking Abortion (Mis)information with ChatGPT on TikTok
Filipo Sharevski
J. Loop
Peter Jachim
Amy Devine
Emma Pieroni
216
12
0
23 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Shiyu Huang
LM&MA
ELM
277
24
0
18 Feb 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
652
142
0
30 Jan 2023
Beam Search Strategies for Neural Machine Translation
Markus Freitag
Yaser Al-Onaizan
510
487
0
06 Feb 2017
1
Page 1 of 1