ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06204
  4. Cited By
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not
  Evaluate

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

9 February 2024
Juhyun Oh
Eunsu Kim
Inha Cha
Alice H. Oh
    ELM
ArXivPDFHTML

Papers citing "The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate"

8 / 8 papers shown
Title
From Infants to AI: Incorporating Infant-like Learning in Models Boosts Efficiency and Generalization in Learning Social Prediction Tasks
Shify Treger
Shimon Ullman
59
0
0
05 Mar 2025
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh
Eunsu Kim
Jiseon Kim
Wenda Xu
Inha Cha
William Yang Wang
Alice H. Oh
21
0
0
09 Oct 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
Andreas Stephan
D. Zhu
Matthias Aßenmacher
Xiaoyu Shen
Benjamin Roth
ELM
45
4
0
06 Sep 2024
Benchmarks as Microscopes: A Call for Model Metrology
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Yang Wang
Naomi Saphra
29
10
0
22 Jul 2024
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial
  General Intelligence
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence
Teo Susnjak
Timothy R. McIntosh
A. Barczak
N. Reyes
Tong Liu
Paul Watters
Malka N. Halgamuge
30
3
0
04 Jul 2024
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang
Muhammad Khalifa
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
LRM
KELM
ReLM
23
31
0
26 Apr 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
230
2,989
0
22 Mar 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1