Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.01376
Cited By
Badllama 3: removing safety finetuning from Llama 3 in minutes
1 July 2024
Dmitrii Volkov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Badllama 3: removing safety finetuning from Llama 3 in minutes"
3 / 3 papers shown
Title
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
Dillon Bowen
Ann-Kathrin Dombrowski
Adam Gleave
Chris Cundy
ELM
48
0
0
17 Mar 2025
Demonstrating specification gaming in reasoning models
Alexander Bondarenko
Denis Volk
Dmitrii Volkov
Jeffrey Ladish
LLMAG
LRM
38
2
0
18 Feb 2025
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
23
12
0
02 Oct 2024
1