DRAGON: Guard LLM Unlearning in Context via Negative Detection and ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2025 |
Synthesizing Post-Training Data for LLMs through Multi-Agent SimulationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?International Conference on Learning Representations (ICLR), 2024 |