Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.13208
Cited By
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
19 April 2024
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions"
8 / 8 papers shown
Title
The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)
Zihao Wang
Yibo Jiang
Jiahao Yu
Heqing Huang
11
0
0
01 May 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Y. Chen
Haoran Li
Yuan Sui
Y. Liu
Yufei He
Y. Song
Bryan Hooi
AAML
SILM
46
0
0
29 Apr 2025
ACE: A Security Architecture for LLM-Integrated App Systems
Evan Li
Tushin Mallick
Evan Rose
William K. Robertson
Alina Oprea
Cristina Nita-Rotaru
24
0
0
29 Apr 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov
Arman Zharmagambetov
Aaron Grattafiori
Chuan Guo
Kamalika Chaudhuri
AAML
16
36
0
22 Apr 2025
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
34
2
0
13 Sep 2024
Learning by Distilling Context
Charles Burton Snell
Dan Klein
Ruiqi Zhong
ReLM
LRM
121
35
0
30 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
185
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
270
8,441
0
04 Mar 2022
1