Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.14516
Cited By
Do LLMs "know" internally when they follow instructions?
18 October 2024
Juyeon Heo
Christina Heinze-Deml
Oussama Elachqar
Shirley Ren
Udhay Nallasamy
Andy Miller
Kwan Ho Ryan Chan
Jaya Narain
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Do LLMs "know" internally when they follow instructions?"
2 / 2 papers shown
Title
Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary
Yakai Li
Jiekang Hu
Weiduan Sang
Luping Ma
Jing Xie
Weijuan Zhang
Aimin Yu
Shijie Zhao
Qingjia Huang
Qihang Zhou
AAML
40
0
0
28 Apr 2025
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Paul Martin
Sarah Mercer
49
0
0
26 Mar 2025
1