Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.13345
Cited By
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
20 October 2023
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan S. Kankanhalli
AAML
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An LLM can Fool Itself: A Prompt-Based Adversarial Attack"
18 / 18 papers shown
Title
CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent
Liang-bo Ning
Shijie Wang
Wenqi Fan
Qing Li
Xin Xu
Hao Chen
Feiran Huang
AAML
21
16
0
13 Apr 2025
TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
Jingyi Zheng
Junfeng Wang
Zhen Sun
Wenhan Dong
Yule Liu
Xinlei He
AAML
43
0
0
10 Mar 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
47
0
0
24 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Saurabh Kumar Pandey
S. Vashistha
Debrup Das
Somak Aditya
Monojit Choudhury
AAML
69
0
0
10 Feb 2025
Transferable Adversarial Attacks on SAM and Its Downstream Models
Song Xia
Wenhan Yang
Yi Yu
Xun Lin
Henghui Ding
Lingyu Duan
Xudong Jiang
AAML
SILM
46
6
0
26 Oct 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAML
VLM
38
3
0
11 Sep 2024
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Manuel Mondal
Ljiljana Dolamic
Gérôme Bovet
Philippe Cudré-Mauroux
Julien Audiffren
34
2
0
21 Jun 2024
Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top
Keyuan Cheng
Muhammad Asif Ali
Shu Yang
Gang Lin
Yuxuan Zhai
Haoyang Fei
Ke Xu
Lu Yu
Lijie Hu
Di Wang
KELM
32
7
0
24 May 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
16
5
0
30 Mar 2024
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
Akash Gupta
Ivaxi Sheth
Vyas Raina
Mark J. F. Gales
Mario Fritz
30
4
0
28 Feb 2024
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Tinghui Ouyang
AprilPyone Maungmaung
Koichi Konishi
Yoshiki Seo
Isao Echizen
AI4MH
18
5
0
15 Jan 2024
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
150
390
0
15 Mar 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
241
1,913
0
31 Dec 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
221
436
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
185
711
0
17 Apr 2018
Adversarial examples in the physical world
Alexey Kurakin
Ian Goodfellow
Samy Bengio
SILM
AAML
250
5,830
0
08 Jul 2016
1