Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11036
Cited By
garak: A Framework for Security Probing Large Language Models
16 June 2024
Leon Derczynski
Erick Galinkin
Jeffrey Martin
Subho Majumdar
Nanna Inie
AAML
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"garak: A Framework for Security Probing Large Language Models"
11 / 11 papers shown
Title
OET: Optimization-based prompt injection Evaluation Toolkit
Jinsheng Pan
Xiaogeng Liu
Chaowei Xiao
AAML
67
0
0
01 May 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
70
0
0
21 Apr 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
50
2
0
14 Mar 2025
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
55
1
0
09 Oct 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
47
113
0
19 Apr 2024
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
Yuxi Li
Yi Liu
Gelei Deng
Ying Zhang
Wenjia Song
Ling Shi
Kailong Wang
Yuekang Li
Yang Liu
Haoyu Wang
34
19
0
15 Apr 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
127
116
0
09 Nov 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
75
246
0
22 May 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
210
297
0
26 Apr 2023
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
150
86
0
06 Dec 2021
1