ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.02143
  4. Cited By
Competition-Level Problems are Effective LLM Evaluators

Competition-Level Problems are Effective LLM Evaluators

4 December 2023
Yiming Huang
Zheng-Wen Lin
Xiao Liu
Yeyun Gong
Shuai Lu
Fangyu Lei
Yaobo Liang
Yelong Shen
Chen Lin
Nan Duan
Weizhu Chen
    ELM
    LRM
ArXivPDFHTML

Papers citing "Competition-Level Problems are Effective LLM Evaluators"

8 / 8 papers shown
Title
Turing Machine Evaluation for Large Language Model
Turing Machine Evaluation for Large Language Model
Haitao Wu
Zongbo Han
Huaxi Huang
Changqing Zhang
ELM
LRM
59
0
0
29 Apr 2025
Evaluating the Performance of Large Language Models in Competitive
  Programming: A Multi-Year, Multi-Grade Analysis
Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis
Adrian Marius Dumitran
Adrian Catalin Badea
Stefan-Gabriel Muscalu
ELM
LRM
15
1
0
31 Aug 2024
Benchmarking Language Model Creativity: A Case Study on Code Generation
Benchmarking Language Model Creativity: A Case Study on Code Generation
Yining Lu
Dixuan Wang
Tianjian Li
Dongwei Jiang
Daniel Khashabi
Meng Jiang
Daniel Khashabi
LRM
52
10
0
12 Jul 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
34
38
0
06 Jun 2024
Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer
Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer
Mingxuan Liu
Jiankai Tang
Haoxiang Li
Jiahao Qi
Siwei Li
Kegang Wang
Yuntao wang
Hong Chen
Yuntao Wang
Hong Chen
89
13
0
07 Feb 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
206
2,232
0
22 Mar 2023
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
671
0
06 Jan 2021
1