v1v2 (latest)

Red Teaming Language Model Detectors with Language Models

Transactions of the Association for Computational Linguistics (TACL), 2023

31 May 2023

ArXiv (abs)PDF HTML Github

Papers citing "Red Teaming Language Model Detectors with Language Models"

40 / 40 papers shown

PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models

255

25 Jul 2025

Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models

270

08 Jun 2025

Safety Alignment Can Be Not Superficial With Explicit Safety Signals

Jianwei Li

Jung-Eng Kim

AAML

510

19 May 2025

A Survey of Attacks on Large Language Models

Wenrui Xu

Keshab K. Parhi

AAML ELM

343

18 May 2025

Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions

Shih-Han Chan

AAML

304

29 Mar 2025

Robustness and Cybersecurity in the EU Artificial Intelligence ActConference on Fairness, Accountability and Transparency (FAccT), 2025

Henrik Nolte

Miriam Rateike

Michèle Finck

416

22 Feb 2025

EvoFlow: Evolving Diverse Agentic Workflows On The Fly

658

11 Feb 2025

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan

1.1K

534

20 Jan 2025

New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook

357

12 Nov 2024

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World ScenariosNeural Information Processing Systems (NeurIPS), 2024

768

31 Oct 2024

Locking Down the Finetuned LLMs Safety

Minjun Zhu

Linyi Yang

Yifan Wei

Ningyu Zhang

Yue Zhang

374

14 Oct 2024

Superficial Safety Alignment Hypothesis

Jianwei Li

Jung-Eun Kim

LLMSV

418

07 Oct 2024

Efficiently Identifying Watermarked Segments in Mixed-Source TextsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

343

04 Oct 2024

Conversational Complexity for Assessing Risk in Large Language Models

John Burden

Manuel Cebrian

José Hernández-Orallo

512

02 Sep 2024

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

369

30 Jul 2024

Is the Digital Forensics and Incident Response Pipeline Ready for Text-Based Threats in LLM Era?

AV Bhandarkar

Ronald Wilson

Anushka Swarup

Mengdi Zhu

Damon Woodard

287

25 Jul 2024

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Kathleen C. Fraser

Hillary Dawkins

S. Kiritchenko

DeLMO

356

21 Jun 2024

A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions

Mohammed Hassanin

Nour Moustafa

391

23 May 2024

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based EvaluationNeural Information Processing Systems (NeurIPS), 2024

415

23 May 2024

Hummer: Towards Limited Competitive Preference Dataset

486

19 May 2024

Vietnamese AI Generated Text Detection

260

06 May 2024

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial AttackInternational Conference on Language Resources and Evaluation (LREC), 2024

322

02 Apr 2024

Mapping the Increasing Use of LLMs in Scientific Papers

...

Diyi Yang

Christopher D. Manning

James Y. Zou

AI4CE DeLMO

260

138

01 Apr 2024

The Impact of Prompts on Zero-Shot Detection of AI-Generated Text

Kaito Taguchi

Yujie Gu

Kouichi Sakurai

AAML DeLMO

259

29 Mar 2024

Bypassing LLM Watermarks with Color-Aware Substitutions

Qilong Wu

Varun Chandrasekaran

246

19 Mar 2024

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

571

19 Mar 2024

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer ReviewsInternational Conference on Machine Learning (ICML), 2024

...

Sheng Liu

356

197

11 Mar 2024

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs

Xuandong Zhao

Lei Li

Yu-Xiang Wang

435

08 Feb 2024

Red-Teaming for Generative AI: Silver Bullet or Security Theater?AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024

Hoda Heidari

534

138

29 Jan 2024

Detecting Multimedia Generated by Large AI Models: A Survey

1.1K

101

22 Jan 2024

Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap

Xingyu Wu

Sheng-hao Wu

Jibin Wu

Liang Feng

Kay Chen Tan

ELM

636

154

18 Jan 2024

Optimizing watermarks for large language models

Bram Wouters

WaLM

233

28 Dec 2023

Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion PrinciplesBigData Congress [Services Society] (BSS), 2023

Sonali Singh

Faranak Abri

A. Namin

169

24 Nov 2023

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Furong Huang

300

23 Oct 2023

A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions

542

120

23 Oct 2023

Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?

Chulin Xie

377

201

16 Oct 2023

Can LLM-Generated Misinformation Be Detected?International Conference on Learning Representations (ICLR), 2023

Canyu Chen

Kai Shu

DeLMO

884

265

25 Sep 2023

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic PromptsInternational Conference on Machine Learning (ICML), 2023

485

149

12 Sep 2023

On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision (IJCV), 2023

Mingyuan Fan

Chengyu Wang

Cen Chen

Yang Liu

Jun Huang

HILM

404

31 Jul 2023

Understanding Multi-Turn Toxic Behaviors in Open-Domain ChatbotsInternational Symposium on Recent Advances in Intrusion Detection (RAID), 2023

Bocheng Chen

Guangjing Wang

Hanqing Guo

Yuanda Wang

Qiben Yan

271

14 Jul 2023