ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.16795
  4. Cited By
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

22 July 2025
Helena Casademunt
Caden Juang
Adam Karvonen
Samuel Marks
Senthooran Rajamanoharan
Neel Nanda
    OODDLLMSV
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github (19★)

Papers citing "Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning"

6 / 6 papers shown
Title
Detecting Adversarial Fine-tuning with Auditing Agents
Detecting Adversarial Fine-tuning with Auditing Agents
Sarah Egler
John Schulman
Nicholas Carlini
AAMLMLAU
53
0
0
17 Oct 2025
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Julian Minder
Clement Dumas
Stewart Slocum
Helena Casademunt
Cameron Holmes
Robert West
Neel Nanda
20
0
0
14 Oct 2025
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Weak Form Learning for Mean-Field Partial Differential Equations: an Application to Insect Movement
Seth Minor
Bret D. Elderd
Benjamin Van Allen
David M. Bortz
Vanja M. Dukic
0
0
0
09 Oct 2025
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Daniel Tan
Anders Woodruff
Niels Warncke
Arun Jose
Maxime Riché
David Demitri Africa
Mia Taylor
72
0
0
05 Oct 2025
Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning
Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning
Wannan Yang
Xinchi Qiu
L. Yu
Yuchen Zhang
Oliver Aobo Yang
Narine Kokhlikyan
Nicola Cancedda
Diego Garcia-Olano
40
0
0
25 Sep 2025
Interpretability as Alignment: Making Internal Understanding a Design Principle
Interpretability as Alignment: Making Internal Understanding a Design Principle
Aadit Sengupta
Pratinav Seth
Vinay Kumar Sankarapu
AAMLAI4CE
28
0
0
10 Sep 2025
1