Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16822
Cited By
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
26 February 2024
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
Manish Bhatt
Yuning Mao
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts"
13 / 13 papers shown
Title
MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming
Stefan Schoepf
Muhammad Zaid Hameed
Ambrish Rawat
Kieran Fraser
Giulio Zizzo
Giandomenico Cornacchia
Mark Purcell
26
0
0
08 Mar 2025
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Bernard Ghanem
Philip H. S. Torr
Adel Bibi
41
1
0
26 Feb 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
36
0
0
19 Feb 2025
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
76
0
0
13 Feb 2025
Jailbreaking to Jailbreak
Jeremy Kritz
Vaughn Robinson
Robert Vacareanu
Bijan Varjavand
Michael Choi
Bobby Gogov
Scale Red Team
Summer Yue
Willow Primack
Zifan Wang
93
0
0
09 Feb 2025
Evolution and The Knightian Blindspot of Machine Learning
Joel Lehman
Elliot Meyerson
Tarek El-Gaaly
Kenneth O. Stanley
Tarin Ziyaee
73
1
0
22 Jan 2025
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen
Dongcheng Zhao
Yiting Dong
Xiang-Yu He
Yi Zeng
AAML
42
0
0
03 Oct 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
42
12
0
28 May 2024
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
Manish P Bhatt
Sahana Chennabasappa
Yue Li
Cyrus Nikolaidis
Daniel Song
...
Yaohui Chen
Dhaval Kapil
David Molnar
Spencer Whitman
Joshua Saxe
ELM
27
31
0
19 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Yunxiang Li
Zihan Li
Kai Zhang
Ruilong Dan
Steven Jiang
You Zhang
LM&MA
AI4MH
114
366
0
24 Mar 2023
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning
Mikayel Samvelyan
Akbir Khan
Michael Dennis
Minqi Jiang
Jack Parker-Holder
Jakob N. Foerster
Roberta Raileanu
Tim Rocktaschel
32
12
0
06 Mar 2023
Replay-Guided Adversarial Environment Design
Minqi Jiang
Michael Dennis
Jack Parker-Holder
Jakob N. Foerster
Edward Grefenstette
Tim Rocktaschel
113
94
0
06 Oct 2021
1