Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.19713
Cited By
v1
v2 (latest)
Red Teaming Language Model Detectors with Language Models
Transactions of the Association for Computational Linguistics (TACL), 2023
31 May 2023
Zhouxing Shi
Yihan Wang
Fan Yin
Xiangning Chen
Kai-Wei Chang
Cho-Jui Hsieh
DeLMO
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Red Teaming Language Model Detectors with Language Models"
40 / 40 papers shown
PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models
Tarek Gasmi
Ramzi Guesmi
Mootez Aloui
Jihene Bennaceur
255
0
0
25 Jul 2025
Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models
Ren-Jian Wang
Ke Xue
Zeyu Qin
Ziniu Li
Sheng Tang
Hao-Tian Li
Shengcai Liu
Chao Qian
AAML
270
0
0
08 Jun 2025
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
Jianwei Li
Jung-Eng Kim
AAML
510
7
0
19 May 2025
A Survey of Attacks on Large Language Models
Wenrui Xu
Keshab K. Parhi
AAML
ELM
343
11
0
18 May 2025
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions
Shih-Han Chan
AAML
304
5
0
29 Mar 2025
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Henrik Nolte
Miriam Rateike
Michèle Finck
416
14
0
22 Feb 2025
EvoFlow: Evolving Diverse Agentic Workflows On The Fly
Guibin Zhang
Kaijie Chen
Guancheng Wan
Heng Chang
Hong Cheng
Kaidi Wang
Shuyue Hu
Wenlong Zhang
658
35
0
11 Feb 2025
Can AI-Generated Text be Reliably Detected?
Vinu Sankar Sadasivan
Aounon Kumar
S. Balasubramanian
Wenxiao Wang
Soheil Feizi
DeLMO
1.1K
534
0
20 Jan 2025
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook
Meng Yang
Tianqing Zhu
Chi Liu
Wanlei Zhou
Shui Yu
Philip S. Yu
AAML
ELM
PILM
357
2
0
12 Nov 2024
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Neural Information Processing Systems (NeurIPS), 2024
Junchao Wu
Runzhe Zhan
Yang Li
Shu Yang
Xinyi Yang
Yulin Yuan
Lidia S. Chao
DeLMO
768
24
0
31 Oct 2024
Locking Down the Finetuned LLMs Safety
Minjun Zhu
Linyi Yang
Yifan Wei
Ningyu Zhang
Yue Zhang
374
25
0
14 Oct 2024
Superficial Safety Alignment Hypothesis
Jianwei Li
Jung-Eun Kim
LLMSV
418
8
0
07 Oct 2024
Efficiently Identifying Watermarked Segments in Mixed-Source Texts
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xuandong Zhao
Chenwen Liao
Yu-Xiang Wang
Lei Li
WaLM
343
3
0
04 Oct 2024
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández-Orallo
512
6
0
02 Sep 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
369
9
0
30 Jul 2024
Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?
AV Bhandarkar
Ronald Wilson
Anushka Swarup
Mengdi Zhu
Damon Woodard
287
5
0
25 Jul 2024
Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods
Kathleen C. Fraser
Hillary Dawkins
S. Kiritchenko
DeLMO
356
53
0
21 Jun 2024
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions
Mohammed Hassanin
Nour Moustafa
391
76
0
23 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Neural Information Processing Systems (NeurIPS), 2024
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
415
49
0
23 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
ZuJie Wen
Jun Zhou
Xiaotie Deng
486
11
0
19 May 2024
Vietnamese AI Generated Text Detection
Quang-Dan Tran
Van-Quan Nguyen
Quang-Huy Pham
K. B. T. Nguyen
Trong-Hop Do
DeLMO
260
1
0
06 May 2024
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack
International Conference on Language Resources and Evaluation (LREC), 2024
Ying Zhou
Xianpei Han
Le Sun
DeLMO
AAML
322
32
0
02 Apr 2024
Mapping the Increasing Use of LLMs in Scientific Papers
Weixin Liang
Yaohui Zhang
Zhengxuan Wu
Haley Lepp
Wenlong Ji
...
Zhi Huang
Diyi Yang
Christopher Potts
Christopher D. Manning
James Y. Zou
AI4CE
DeLMO
260
138
0
01 Apr 2024
The Impact of Prompts on Zero-Shot Detection of AI-Generated Text
Kaito Taguchi
Yujie Gu
Kouichi Sakurai
AAML
DeLMO
259
9
0
29 Mar 2024
Bypassing LLM Watermarks with Color-Aware Substitutions
Qilong Wu
Varun Chandrasekaran
246
28
0
19 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
Erfan Shayegani
PILM
571
53
0
19 Mar 2024
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
International Conference on Machine Learning (ICML), 2024
Weixin Liang
Zachary Izzo
Yaohui Zhang
Haley Lepp
Hancheng Cao
...
Haotian Ye
Sheng Liu
Zhi Huang
Daniel A. McFarland
James Y. Zou
DeLMO
356
197
0
11 Mar 2024
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Xuandong Zhao
Lei Li
Yu-Xiang Wang
435
12
0
08 Feb 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
534
138
0
29 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
Xin Li
Luisa Verdoliva
Shu Hu
1.1K
101
0
22 Jan 2024
Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap
Xingyu Wu
Sheng-hao Wu
Jibin Wu
Liang Feng
Kay Chen Tan
ELM
636
154
0
18 Jan 2024
Optimizing watermarks for large language models
Bram Wouters
WaLM
233
21
0
28 Dec 2023
Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles
BigData Congress [Services Society] (BSS), 2023
Sonali Singh
Faranak Abri
A. Namin
169
30
0
24 Nov 2023
Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey
Soumya Suvra Ghosal
Souradip Chakraborty
Jonas Geiping
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
DeLMO
300
53
0
23 Oct 2023
A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions
Junchao Wu
Shu Yang
Runzhe Zhan
Yulin Yuan
Yang Li
Lidia S. Chao
DeLMO
542
120
0
23 Oct 2023
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?
Yu-Lin Tsai
Chia-Yi Hsu
Chulin Xie
Chih-Hsun Lin
Jia-You Chen
Yue Liu
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
DiffM
377
201
0
16 Oct 2023
Can LLM-Generated Misinformation Be Detected?
International Conference on Learning Representations (ICLR), 2023
Canyu Chen
Kai Shu
DeLMO
884
265
0
25 Sep 2023
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
International Conference on Machine Learning (ICML), 2023
Zhi-Yi Chin
Chieh-Ming Jiang
Ching-Chun Huang
Pin-Yu Chen
Wei-Chen Chiu
DiffM
485
149
0
12 Sep 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
International Journal of Computer Vision (IJCV), 2023
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
404
14
0
31 Jul 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
International Symposium on Recent Advances in Intrusion Detection (RAID), 2023
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
271
26
0
14 Jul 2023
1
Page 1 of 1