BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and LanguagesNeural Information Processing Systems (NeurIPS), 2024 |
Better Instruction-Following Through Minimum Bayes RiskInternational Conference on Learning Representations (ICLR), 2024 |
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Seungone Kim Juyoung Suk Ji Yong Cho Shayne Longpre Chaeeun Kim ...Sean Welleck Graham Neubig Moontae Lee Kyungjae Lee Minjoon Seo |