Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2507.16795
Cited By
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
22 July 2025
Helena Casademunt
Caden Juang
Adam Karvonen
Samuel Marks
Senthooran Rajamanoharan
Neel Nanda
OODD
LLMSV
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github (19★)
Papers citing
"Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning"
6 / 6 papers shown
Title
Detecting Adversarial Fine-tuning with Auditing Agents
Sarah Egler
John Schulman
Nicholas Carlini
AAML
MLAU
53
0
0
17 Oct 2025
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Julian Minder
Clement Dumas
Stewart Slocum
Helena Casademunt
Cameron Holmes
Robert West
Neel Nanda
20
0
0
14 Oct 2025
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Seth Minor
Bret D. Elderd
Benjamin Van Allen
David M. Bortz
Vanja M. Dukic
0
0
0
09 Oct 2025
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Daniel Tan
Anders Woodruff
Niels Warncke
Arun Jose
Maxime Riché
David Demitri Africa
Mia Taylor
72
0
0
05 Oct 2025
Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning
Wannan Yang
Xinchi Qiu
L. Yu
Yuchen Zhang
Oliver Aobo Yang
Narine Kokhlikyan
Nicola Cancedda
Diego Garcia-Olano
40
0
0
25 Sep 2025
Interpretability as Alignment: Making Internal Understanding a Design Principle
Aadit Sengupta
Pratinav Seth
Vinay Kumar Sankarapu
AAML
AI4CE
28
0
0
10 Sep 2025
1