ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.17424
  4. Cited By
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

24 February 2025
Jan Betley
Daniel Tan
Niels Warncke
Anna Sztyber-Betley
Xuchan Bao
Martín Soto
Nathan Labenz
Owain Evans
    AAML
ArXivPDFHTML

Papers citing "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs"

7 / 7 papers shown
Title
Patterns and Mechanisms of Contrastive Activation Engineering
Patterns and Mechanisms of Contrastive Activation Engineering
Yixiong Hao
Ayush Panda
Stepan Shabalin
Sheikh Abdur Raheem Ali
LLMSV
44
1
0
06 May 2025
Jekyll-and-Hyde Tipping Point in an AI's Behavior
Jekyll-and-Hyde Tipping Point in an AI's Behavior
Neil F. Johnson
Frank Yingjie Huo
34
13
0
29 Apr 2025
IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery
IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery
Aniketh Garikaparthi
Manasi S. Patwardhan
L. Vig
Arman Cohan
VLM
LRM
37
1
0
23 Apr 2025
Safety Pretraining: Toward the Next Generation of Safe AI
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
36
1
0
23 Apr 2025
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Beyond Misinformation: A Conceptual Framework for Studying AI Hallucinations in (Science) Communication
Anqi Shao
34
33
0
18 Apr 2025
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond
Frank Yingjie Huo
Neil F. Johnson
30
1
0
06 Apr 2025
Propaganda is all you need
Propaganda is all you need
Paul Kronlund-Drouault
33
1
0
13 Sep 2024
1