Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.11509
Cited By
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information
20 November 2023
Zhengmian Hu
Gang Wu
Saayan Mitra
Ruiyi Zhang
Tong Sun
Heng-Chiao Huang
Vishy Swaminathan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information"
11 / 11 papers shown
Title
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
46
1
0
09 Oct 2024
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Jason Zhang
Julius Broomfield
Sara Pieri
Reihaneh Iranmanesh
Reihaneh Rabbany
Kellin Pelrine
AAML
28
6
0
29 Aug 2024
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
Xuanyu Su
Yansong Li
Diana Inkpen
Nathalie Japkowicz
VLM
81
2
0
11 Aug 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
29
7
0
30 Jul 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
49
5
0
25 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
AAML
55
28
0
03 Jun 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
53
22
0
19 Mar 2024
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
Arijit Ghosh Chowdhury
Md. Mofijul Islam
Vaibhav Kumar
F. H. Shezan
Vaibhav Kumar
Vinija Jain
Aman Chadha
AAML
PILM
23
4
0
03 Mar 2024
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
Zihao Xu
Yi Liu
Gelei Deng
Yuekang Li
S. Picek
PILM
AAML
28
34
0
21 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
44
55
0
14 Feb 2024
1