The Generative AI Paradox on Evaluation: What It Can Solve, It May Not
Evaluate

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

9 February 2024

Papers citing "The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate"

8 / 8 papers shown

Title
From Infants to AI: Incorporating Infant-like Learning in Models Boosts Efficiency and Generalization in Learning Social Prediction Tasks Shify Treger Shimon Ullman 59 0 0 05 Mar 2025
Uncovering Factor Level Preferences to Improve Human-Model Alignment Juhyun Oh Eunsu Kim Jiseon Kim Wenda Xu Inha Cha William Yang Wang Alice H. Oh 21 0 0 09 Oct 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks Andreas Stephan D. Zhu Matthias Aßenmacher Xiaoyu Shen Benjamin Roth ELM 45 4 0 06 Sep 2024
Benchmarks as Microscopes: A Call for Model Metrology Michael Stephen Saxon Ari Holtzman Peter West William Yang Wang Naomi Saphra 29 10 0 22 Jul 2024
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence Teo Susnjak Timothy R. McIntosh A. Barczak N. Reyes Tong Liu Paul Watters Malka N. Halgamuge 30 3 0 04 Jul 2024
Small Language Models Need Strong Verifiers to Self-Correct Reasoning Yunxiang Zhang Muhammad Khalifa Lajanugen Logeswaran Jaekyeom Kim Moontae Lee Honglak Lee Lu Wang LRM KELM ReLM 23 31 0 26 Apr 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 230 2,989 0 22 Mar 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022