Improving Causal Interventions in Amnesic Probing with Mean Projection or LEACEAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
MANBench: Is Your Multimodal Model Smarter than Human?Annual Meeting of the Association for Computational Linguistics (ACL), 2025 |
COSMIC: Generalized Refusal Direction Identification in LLM ActivationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation SteeringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
On Linear Representations and Pretraining Data Frequency in Language ModelsInternational Conference on Learning Representations (ICLR), 2025 |