ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.02806
  4. Cited By
The RealHumanEval: Evaluating Large Language Models' Abilities to
  Support Programmers

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

3 April 2024
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
    ELM
ArXivPDFHTML

Papers citing "The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers"

8 / 8 papers shown
Title
How Accurately Do Large Language Models Understand Code?
How Accurately Do Large Language Models Understand Code?
Sabaat Haroon
Ahmad Faraz Khan
Ahmad Humayun
Waris Gill
Abdul Haddi Amjad
A. R. Butt
Mohammad Taha Khan
Muhammad Ali Gulzar
ELM
LRM
25
0
0
06 Apr 2025
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
How Do Analysts Understand and Verify AI-Assisted Data Analyses?
How Do Analysts Understand and Verify AI-Assisted Data Analyses?
Ken Gu
Ruoxi Shang
Tim Althoff
Chenglong Wang
Steven Drucker
AAML
33
25
0
19 Sep 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
169
388
0
02 May 2023
Aligning Offline Metrics and Human Judgments of Value for Code
  Generation Models
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models
Victor C. Dibia
Adam Fourney
Gagan Bansal
Forough Poursabzi-Sangdeh
Han Liu
Saleema Amershi
ALM
OffRL
30
12
0
29 Oct 2022
Grounded Copilot: How Programmers Interact with Code-Generating Models
Grounded Copilot: How Programmers Interact with Code-Generating Models
Shraddha Barke
M. James
Nadia Polikarpova
136
212
0
30 Jun 2022
Productivity Assessment of Neural Code Completion
Productivity Assessment of Neural Code Completion
Albert Ziegler
Eirini Kalliamvakou
Shawn Simister
Ganesh Sittampalam
Alice Li
Andrew Rice
Devon Rifkin
E. Aftandilian
102
176
0
13 May 2022
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding
  and Generation
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
188
853
0
09 Feb 2021
1