Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?Annual Meeting of the Association for Computational Linguistics (ACL), 2025 |
ASPERA: A Simulated Environment to Evaluate Planning for Complex Action ExecutionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |