Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.00799
Cited By
Are you still on track!? Catching LLM Task Drift with Activations
2 June 2024
Sahar Abdelnabi
Aideen Fay
Giovanni Cherubin
Ahmed Salem
Mario Fritz
Andrew J. Paverd
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are you still on track!? Catching LLM Task Drift with Activations"
3 / 3 papers shown
Title
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
14
44
0
02 May 2024
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace
Kai Y. Xiao
R. Leike
Lilian Weng
Johannes Heidecke
Alex Beutel
SILM
14
71
0
19 Apr 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet
Maha Alrashed
Chawin Sitawarin
Sizhe Chen
Zeming Wei
Elizabeth Sun
Basel Alomair
David A. Wagner
AAML
SyDa
50
22
0
29 Dec 2023
1