ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.09938
  4. Cited By
Measuring Coding Challenge Competence With APPS
v1v2v3 (latest)

Measuring Coding Challenge Competence With APPS

20 May 2021
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
Ethan Guo
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
    ELMAIMatALM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Measuring Coding Challenge Competence With APPS"

50 / 542 papers shown
Title
Measuring AI Ability to Complete Long Tasks
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
377
59
0
18 Mar 2025
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain SettingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Daniil Orel
Dilshod Azizov
Preslav Nakov
DeLMO
224
4
0
17 Mar 2025
Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning
Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning
Yuan Jiang
Yujian Zhang
Liang Lu
Christoph Treude
Xiaohong Su
Shan Huang
Tiantian Wang
ALM
159
1
0
12 Mar 2025
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
LRM
226
4
0
10 Mar 2025
ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation
ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kaiyuan Liu
Youcheng Pan
Junlin Li
Daojing He
Yang Xiang
Yexing Du
Tianrun Gao
ELMLLMAG
149
8
0
10 Mar 2025
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature ImplementationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wei Li
Xin Zhang
Zhongxin Guo
Shaoguang Mao
Wen Luo
Guangyue Peng
Yangyu Huang
Houfeng Wang
Scarlett Li
154
12
0
09 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality
Roham Koohestani
Philippe de Bekker
Begüm Koç
Maliheh Izadi
VLM
284
0
0
07 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPCAI4CE
242
1
0
06 Mar 2025
Factorio Learning Environment
Jack Hopkins
Mart Bakler
Akbir Khan
LRMAI4CELLMAG
150
1
0
06 Mar 2025
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions
Julian Aron Prenner
Romain Robbes
218
0
0
06 Mar 2025
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
Peiding Wang
Lulu Zhang
Fang Liu
Lin Shi
Minxiao Li
Bo Shen
An Fu
ELMLRM
687
7
0
05 Mar 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALMOffRL
238
43
0
04 Mar 2025
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair
Zaoyu Chen
Haoran Qin
Polydoros Giannouris
Xiangyu Zhao
Lei Xue
Xiapu Luo
Xiao-Ming Wu
166
2
0
03 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
210
1
0
02 Mar 2025
ProBench: Benchmarking Large Language Models in Competitive Programming
ProBench: Benchmarking Large Language Models in Competitive Programming
Lei Yang
Renren Jin
Ling Shi
Jianxiang Peng
Yue Chen
Deyi Xiong
ReLMELMLRM
120
6
0
28 Feb 2025
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets
Chichien Tsai
Chiamu Yu
Yingdar Lin
Yusung Wu
Weibin Lee
AAML
230
1
0
27 Feb 2025
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback EnvironmentsInternational Conference on Learning Representations (ICLR), 2025
Hojae Han
Seung-won Hwang
Rajhans Samdani
Yuxiong He
ALM
191
8
0
27 Feb 2025
IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages
IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages
Ujjwal Singh
Aditi Sharma
Nikhil Gupta
Deepakshi
Vivek Kumar Jha
ELM
83
0
0
26 Feb 2025
Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Alireza Daghighfarsoodeh
Chung-Yu Wang
Hamed Taherkhani
Melika Sepidband
Mohammad Abdollahi
Hadi Hemmati
Hung Viet Pham
ALMELM
407
3
0
26 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLMOffRLLRM
238
27
0
26 Feb 2025
StatLLM: A Dataset for Evaluating the Performance of Large Language Models in Statistical Analysis
StatLLM: A Dataset for Evaluating the Performance of Large Language Models in Statistical Analysis
Xinyi Song
Lina Lee
Kexin Xie
Xueying Liu
Xinwei Deng
Yili Hong
ALMELM
717
2
0
24 Feb 2025
Mechanistic Understanding of Language Models in Syntactic Code Completion
Mechanistic Understanding of Language Models in Syntactic Code Completion
Samuel Miller
Daking Rai
Ziyu Yao
LRM
93
0
0
20 Feb 2025
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
How Efficient is LLM-Generated Code? A Rigorous & High-Standard BenchmarkInternational Conference on Learning Representations (ICLR), 2024
Ruizhong Qiu
Weiliang Will Zeng
Hanghang Tong
James Ezick
Christopher Lott
455
40
0
20 Feb 2025
Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing
Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing
Boyuan Chen
Mingzhi Zhu
Brendan Dolan-Gavitt
Mohamed Bennai
Siddharth Garg
150
1
0
17 Feb 2025
LeDex: Training LLMs to Better Self-Debug and Explain Code
LeDex: Training LLMs to Better Self-Debug and Explain CodeNeural Information Processing Systems (NeurIPS), 2024
Nan Jiang
Xiaopeng Li
Shiqi Wang
Qiang Zhou
Soneya Binta Hossain
Baishakhi Ray
Varun Kumar
Xiaofei Ma
Hao Ding
LRM
236
34
0
17 Feb 2025
GiFT: Gibbs Fine-Tuning for Code Generation
GiFT: Gibbs Fine-Tuning for Code GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Haochen Li
Wanjin Feng
Xin Zhou
Zhiqi Shen
SyDa
259
2
0
17 Feb 2025
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
SWE-Lancer: Can Frontier LLMs Earn 1MillionfromReal−WorldFreelanceSoftwareEngineering?1 Million from Real-World Freelance Software Engineering?1MillionfromReal−WorldFreelanceSoftwareEngineering?
Samuel Miserendino
Ming Wang
Tejal Patwardhan
Johannes Heidecke
245
54
0
17 Feb 2025
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities
Hanbin Wang
Xiaoxuan Zhou
Zhipeng Xu
Keyuan Cheng
Yuxin Zuo
Kai Tian
Jingwei Song
Junting Lu
Wenhui Hu
Xueyang Liu
LRMMLLM
176
3
0
17 Feb 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C. Xie
Shuo Cai
Wenjun Wang
Pengxiang Li
Zhijie Sang
...
Xiaotian Han
Jianbo Yuan
Shengyu Zhang
Leilei Gan
Hongxia Yang
LRM
211
2
0
17 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Preference Optimization for Reasoning with Pseudo FeedbackInternational Conference on Learning Representations (ICLR), 2024
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq Joty
Furu Wei
LRM
312
31
0
17 Feb 2025
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Bohan Lyu
Siqiao Huang
Zichen Liang
Qi-An Sun
Jiaming Zhang
ELMLRM
294
0
0
16 Feb 2025
AuPair: Golden Example Pairs for Code Repair
AuPair: Golden Example Pairs for Code Repair
Aditi Mavalankar
Hassan Mansoor
Zita Marinho
Masha Samsikova
Tom Schaul
KELMLRM
593
1
0
12 Feb 2025
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
Xin Zhou
Martin Weyssow
Ratnadira Widyasari
Ting Zhang
Junda He
Yunbo Lyu
Jianming Chang
Beiqi Zhang
Dan Huang
David Lo
PILM
824
23
0
10 Feb 2025
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research RepositoriesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yijia Xiao
Runhui Wang
Luyang Kong
Davor Golac
Wei Wang
LLMAG
880
7
0
10 Feb 2025
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and DebuggingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Md. Ashraful Islam
Mohammed Eunus Ali
Md. Rizwan Parvez
LLMAG
328
15
0
08 Feb 2025
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
Quinn Dougherty
Ronak Mehta
ALM
185
7
0
08 Feb 2025
CodeSCM: Causal Analysis for Multi-Modal Code Generation
CodeSCM: Causal Analysis for Multi-Modal Code GenerationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Mukur Gupta
Noopur Bhatt
Suman Jana
189
1
0
07 Feb 2025
QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration
QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration
Shaola Ren
Li Ke
Longtao Huang
Dehong Gao
Hui Xue
115
0
0
06 Feb 2025
Process Reinforcement through Implicit Rewards
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
...
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
OffRLLRM
296
199
0
03 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
406
20
0
29 Jan 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for CodeInternational Conference on Information Control Systems & Technologies (ICICST), 2023
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
303
12
0
28 Jan 2025
Towards Advancing Code Generation with Large Language Models: A Research Roadmap
Towards Advancing Code Generation with Large Language Models: A Research Roadmap
Haolin Jin
Huaming Chen
Qinghua Lu
Liming Zhu
LLMAG
174
4
0
20 Jan 2025
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks
Yaojie Hu
Qiang Zhou
Qihong Chen
Xiaopeng Li
Linbo Liu
Dejiao Zhang
Amit Kachroo
Talha Oz
Omer Tripp
296
14
0
20 Jan 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
Shing-Chi Cheung
ALM
363
8
0
18 Jan 2025
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Jinjun Peng
Leyi Cui
Kele Huang
Junfeng Yang
Baishakhi Ray
ELM
192
28
0
14 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
231
5
0
07 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALMLRM
518
526
0
03 Jan 2025
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Shanghaoran Quan
Jiaxi Yang
Bowen Yu
Jian Xu
Dayiheng Liu
...
Zeyu Cui
Yang Fan
Yanzhe Zhang
Binyuan Hui
Junyang Lin
ALMELMLRM
249
69
0
02 Jan 2025
Training Software Engineering Agents and Verifiers with SWE-Gym
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan
Xingyao Wang
Graham Neubig
Navdeep Jaitly
Chenhui Xu
Alane Suhr
Yizhe Zhang
279
98
0
30 Dec 2024
GenX: Mastering Code and Test Generation with Execution Feedback
GenX: Mastering Code and Test Generation with Execution Feedback
Nan Wang
Yafei Liu
Chen Chen
H. Lu
181
2
0
18 Dec 2024
Previous
12345...91011
Next