
Title |
|---|
![]() GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() TyDi QA: A Benchmark for Information-Seeking Question Answering in
Typologically Diverse LanguagesTransactions of the Association for Computational Linguistics (TACL), 2020 |