Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04528
Cited By
PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
7 June 2023
Kaijie Zhu
Jindong Wang
Jiaheng Zhou
Zichen Wang
Hao Chen
Yidong Wang
Linyi Yang
Weirong Ye
Yue Zhang
Neil Zhenqiang Gong
Xingxu Xie
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts"
50 / 118 papers shown
Title
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAML
SILM
46
0
0
07 May 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILM
AAML
ELM
75
0
0
25 Apr 2025
FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness
Chandan Kumar Sah
Xiaoli Lian
Tony Xu
Li Zhang
26
0
0
10 Apr 2025
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Gijs Bellaard
B. Smets
R. Duits
53
0
0
04 Apr 2025
How does Watermarking Affect Visual Language Models in Document Understanding?
Chunxue Xu
Yiwei Wang
Bryan Hooi
Yujun Cai
Songze Li
VLM
44
0
0
01 Apr 2025
Get the Agents Drunk: Memory Perturbations in Autonomous Agent-based Recommender Systems
Shiyi Yang
Z. Hu
Chen Wang
Tong Yu
Xiwei Xu
Liming Zhu
Lina Yao
AAML
37
0
0
31 Mar 2025
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
Dipin Khati
Yijin Liu
David Nader-Palacio
Yixuan Zhang
Denys Poshyvanyk
48
0
0
18 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
34
1
0
14 Mar 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Prapti Trivedi
Vikas Yadav
Oluwanifemi Bamgbose
Sathwik Tejaswi Madhusudan
James Y. Zou
Nazneen Rajani
AAML
LRM
45
2
0
03 Mar 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
Grigor Nalbandyan
Rima Shahbazyan
Evelina Bakhturina
ELM
33
0
0
28 Feb 2025
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji
Tatsuya Hiraoka
Yuchang Cheng
Eiji Aramaki
Tomoya Iwakura
66
0
0
27 Feb 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
57
2
0
24 Feb 2025
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Sarah Ball
Simeon Allmendinger
Frauke Kreuter
Niklas Kühl
44
0
0
22 Feb 2025
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
34
1
0
21 Feb 2025
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia
Subhojyoti Mukherjee
Zhouhang Xie
Junda Wu
Xintong Li
...
Namyong Park
T. Nguyen
Jiebo Luo
Ryan Rossi
Julian McAuley
53
0
0
17 Feb 2025
Authenticated Delegation and Authorized AI Agents
Tobin South
Samuele Marro
Thomas Hardjono
Robert Mahari
Cedric Deslandes Whitney
Dazza Greenwood
Alan Chan
Alex Pentland
42
3
0
17 Jan 2025
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
Xiyang Hu
AAML
31
1
0
03 Jan 2025
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach
T. T. Wang
John Hughes
Henry Sleight
Rylan Schaeffer
Rajashree Agrawal
Fazl Barez
Mrinank Sharma
Jesse Mu
Nir Shavit
Ethan Perez
AAML
84
4
0
03 Dec 2024
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI
Jonghong Jeon
29
2
0
29 Oct 2024
Evaluating Morphological Compositional Generalization in Large Language Models
Mete Ismayilzada
Defne Çirci
Jonne Sälevä
Hale Sirin
Abdullatif Köksal
Bhuwan Dhingra
Antoine Bosselut
Lonneke van der Plas
Duygu Ataman
26
2
0
16 Oct 2024
A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding
Abdulfattah Safa
Gözde Gül Şahin
28
1
0
24 Sep 2024
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration
Xin Guan
Nathaniel Demchak
Saloni Gupta
Ze Wang
Ediz Ertekin Jr.
Adriano Soares Koshiyama
Emre Kazim
Zekun Wu
32
2
0
17 Sep 2024
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
Alexander Wuttke
Matthias Aßenmacher
Christopher Klamm
Max M. Lang
Quirin Würschinger
Frauke Kreuter
34
2
0
16 Sep 2024
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Jinyang Wu
Feihu Che
Chuyuan Zhang
Jianhua Tao
Shuai Zhang
Pengpeng Shao
23
2
0
24 Aug 2024
Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions
Zhiwen You
Haejin Lee
Shubhanshu Mishra
Sullam Jeoung
Apratim Mishra
Jinseok Kim
Jana Diesner
17
9
0
07 Jul 2024
Systematic Task Exploration with LLMs: A Study in Citation Text Generation
Furkan Şahinuç
Ilia Kuznetsov
Yufang Hou
Iryna Gurevych
21
3
0
04 Jul 2024
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations
Junkai Chen
Zhenhao Li
Xing Hu
Xin Xia
AAML
32
7
0
28 Jun 2024
Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness
Erh-Chung Chen
Pin-Yu Chen
I-Hsin Chung
Che-Rung Lee
24
1
0
28 Jun 2024
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
30
6
0
21 Jun 2024
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Hasan Hammoud
Umberto Michieli
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
Mete Ozay
MoMe
31
14
0
20 Jun 2024
UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
Xunzhi Wang
Zhuowei Zhang
Qiongyu Li
Gaonan Chen
Mengting Hu
Zhiyu li
Bitong Luo
Hang Gao
Zhixin Han
Haotian Wang
ELM
35
3
0
18 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
56
5
0
17 Jun 2024
Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment
Jie Li
Yi Liu
Chongyang Liu
Xiaoning Ren
Ling Shi
Weisong Sun
Yinxing Xue
27
0
0
17 Jun 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRM
AAML
ELM
24
0
0
16 Jun 2024
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
Aihua Pei
Zehua Yang
Shunan Zhu
Ruoxi Cheng
Ju Jia
Lina Wang
26
0
0
16 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
52
9
0
12 Jun 2024
On the Worst Prompt Performance of Large Language Models
Bowen Cao
Deng Cai
Zhisong Zhang
Yuexian Zou
Wai Lam
ALM
LRM
19
5
0
08 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
38
16
0
08 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
32
38
0
06 Jun 2024
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Alina Leidinger
R. Rooij
Ekaterina Shutova
LRM
19
3
0
05 Jun 2024
Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Sheng-Lun Wei
Cheng-Kuang Wu
Hen-Hsen Huang
Hsin-Hsi Chen
21
10
0
05 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
32
17
0
03 Jun 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
38
32
0
31 May 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
Ziwei Ji
Yuzhe Gu
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai-xiang Chen
HILM
38
2
0
30 May 2024
C
3
^{3}
3
Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models
Jiahuan Cao
Yongxin Shi
Dezhi Peng
Yang Liu
Lianwen Jin
ELM
21
0
0
28 May 2024
Large Language Model Sentinel: LLM Agent for Adversarial Purification
Guang Lin
Qibin Zhao
Qibin Zhao
AAML
40
2
0
24 May 2024
Can large language models understand uncommon meanings of common words?
Jinyang Wu
Feihu Che
Xinxin Zheng
Shuai Zhang
Ruihan Jin
Shuai Nie
Pengpeng Shao
Jianhua Tao
27
1
0
09 May 2024
Harmonic LLMs are Trustworthy
Nicholas S. Kersting
Mohammad Rahman
Suchismitha Vedala
Yang Wang
38
0
0
30 Apr 2024
Empowering Large Language Models for Textual Data Augmentation
Yichuan Li
Kaize Ding
Jianling Wang
Kyumin Lee
16
10
0
26 Apr 2024
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
51
13
0
25 Apr 2024
1
2
3
Next