Title |
---|
![]() The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Seungone Kim Juyoung Suk Ji Yong Cho Shayne Longpre Chaeeun Kim ...Sean Welleck Graham Neubig Moontae Lee Kyungjae Lee Minjoon Seo |
![]() Catwalk: A Unified Language Model Evaluation Framework for Many Datasets Dirk Groeneveld Anas Awadalla Iz Beltagy Akshita Bhagia Ian H. Magnusson Hao Peng Oyvind Tafjord Pete Walsh Kyle Richardson Jesse Dodge |