Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.16494
Cited By
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models
29 October 2022
Victor C. Dibia
Adam Fourney
Gagan Bansal
Forough Poursabzi-Sangdeh
Han Liu
Saleema Amershi
ALM
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Aligning Offline Metrics and Human Judgments of Value for Code Generation Models"
15 / 15 papers shown
Title
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Adam Fourney
Gagan Bansal
Hussein Mozannar
Cheng Tan
Eduardo Salinas
...
Victor C. Dibia
Ahmed Hassan Awadallah
Ece Kamar
Rafah Hosn
Saleema Amershi
AI4CE
LRM
LLMAG
38
34
0
07 Nov 2024
Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition
Majeed Kazemitabaar
Jack Williams
Ian Drosos
Tovi Grossman
Austin Z. Henley
Carina Negreanu
Advait Sarkar
19
17
0
02 Jul 2024
Assessing and Verifying Task Utility in LLM-Powered Applications
Negar Arabzadeh
Siging Huo
Nikhil Mehta
Qinqyun Wu
Chi Wang
Ahmed Hassan Awadallah
Charles L. A. Clarke
Julia Kiseleva
28
10
0
03 May 2024
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
33
1
0
26 Apr 2024
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
ELM
35
18
0
03 Apr 2024
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
40
8
0
31 Mar 2024
Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications
Negar Arabzadeh
Julia Kiseleva
Qingyun Wu
Chi Wang
Ahmed Hassan Awadallah
Victor C. Dibia
Adam Fourney
Charles L. A. Clarke
LLMAG
24
7
0
14 Feb 2024
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge
Norbert Tihanyi
M. Ferrag
Ridhi Jain
Tamás Bisztray
Merouane Debbah
ELM
19
18
0
12 Feb 2024
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
94
25
0
13 May 2022
Productivity Assessment of Neural Code Completion
Albert Ziegler
Eirini Kalliamvakou
Shawn Simister
Ganesh Sittampalam
Alice Li
Andrew Rice
Devon Rifkin
E. Aftandilian
102
176
0
13 May 2022
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
193
624
0
26 Feb 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
201
1,451
0
02 Sep 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
192
614
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
190
853
0
09 Feb 2021
1