
Title |
|---|
![]() GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the dataInternational Conference on Learning Representations (ICLR), 2024 |
![]() ChainBuddy: An AI Agent System for Generating LLM PipelinesInternational Conference on Human Factors in Computing Systems (CHI), 2024 |
![]() Can Unconfident LLM Annotations Be Used for Confident Conclusions?North American Chapter of the Association for Computational Linguistics (NAACL), 2024 |