v1v2 (latest)

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

Neural Information Processing Systems (NeurIPS), 2023

4 December 2023

Papers citing "Tree of Attacks: Jailbreaking Black-Box LLMs Automatically"

17 / 167 papers shown

Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

457

27 May 2024

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based EvaluationNeural Information Processing Systems (NeurIPS), 2024

357

23 May 2024

Securing the Future of GenAI: Policy and Technology

...

297

21 May 2024

Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs

Valeriia Cherepanova

James Zou

AAML

350

26 Apr 2024

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

381

122

21 Apr 2024

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024

Maksym Andriushchenko

Francesco Croce

Nicolas Flammarion

AAML

793

374

02 Apr 2024

Optimization-based Prompt Injection Attack to LLM-as-a-Judge

546

119

26 Mar 2024

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Yi Zeng

285

19 Mar 2024

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

...

Xuanjing Huang

231

18 Mar 2024

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

Aly M. Kassem

Omar Mahmoud

Niloofar Mireshghallah

Yejin Choi

415

05 Mar 2024

PAL: Proxy-Guided Black-Box Attack on Large Language Models

231

15 Feb 2024

Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks

196

14 Feb 2024

Attacking Large Language Models with Projected Gradient Descent

Stephan Günnemann

317

14 Feb 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

...

359

741

06 Feb 2024

Weak-to-Strong Jailbreaking on Large Language Models

929

30 Jan 2024

Red-Teaming for Generative AI: Silver Bullet or Security Theater?AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024

Hoda Heidari

437

115

29 Jan 2024

Hijacking Large Language Models via Adversarial In-Context Learning

510

16 Nov 2023