Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.17424
Cited By
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
24 February 2025
Jan Betley
Daniel Tan
Niels Warncke
Anna Sztyber-Betley
Xuchan Bao
Martín Soto
Nathan Labenz
Owain Evans
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs"
7 / 7 papers shown
Title
Patterns and Mechanisms of Contrastive Activation Engineering
Yixiong Hao
Ayush Panda
Stepan Shabalin
Sheikh Abdur Raheem Ali
LLMSV
44
1
0
06 May 2025
Jekyll-and-Hyde Tipping Point in an AI's Behavior
Neil F. Johnson
Frank Yingjie Huo
34
13
0
29 Apr 2025
IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery
Aniketh Garikaparthi
Manasi S. Patwardhan
L. Vig
Arman Cohan
VLM
LRM
37
1
0
23 Apr 2025
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
36
1
0
23 Apr 2025
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Anqi Shao
34
33
0
18 Apr 2025
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond
Frank Yingjie Huo
Neil F. Johnson
30
1
0
06 Apr 2025
Propaganda is all you need
Paul Kronlund-Drouault
33
1
0
13 Sep 2024
1