v1v2 (latest)

Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives

23 May 2025

Papers citing "Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives"

21 / 21 papers shown

A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models

205

04 Sep 2025

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution StrategyComputer Vision and Pattern Recognition (CVPR), 2025

374

26 Mar 2025

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

...

291

08 Jan 2025

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

275

17 Jun 2024

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Qian Liu

314

124

13 Jun 2024

Towards Lifelong Learning of Large Language Models: A Survey

Qianli Ma

297

10 Jun 2024

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

318

03 Jun 2024

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Jindong Gu

Yang Liu

Simeng Qin

Min Lin

AAML

367

31 May 2024

AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs

Zeyi Liao

Huan Sun

AAML

321

151

11 Apr 2024

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

Yinpeng Dong

253

107

28 Feb 2024

Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

242

19 Feb 2024

Security and Privacy Challenges of Large Language Models: A Survey

387

325

30 Jan 2024

Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023

194

14 Dec 2023

Tree of Attacks: Jailbreaking Black-Box LLMs AutomaticallyNeural Information Processing Systems (NeurIPS), 2023

354

472

04 Dec 2023

Jailbreaking Black Box Large Language Models in Twenty Queries

George J. Pappas

677

1,109

12 Oct 2023

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

981

516

19 Sep 2023

Universal and Transferable Adversarial Attacks on Aligned Language Models

J. Zico Kolter

647

2,367

27 Jul 2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

Louis Martin

...

Sharan Narang

Sergey Edunov

8.8K

15,551

18 Jul 2023

Towards Revealing the Mystery behind Chain of Thought: A Theoretical PerspectiveNeural Information Processing Systems (NeurIPS), 2023

663

359

24 May 2023

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Yi Liu

Lida Zhao

Kailong Wang

Yang Liu

430

617

23 May 2023

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.0K

53,198

28 May 2020