Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.19502
Cited By
CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation
26 May 2025
Guang Yang
Yu Zhou
Xiang Chen
Wei-Shi Zheng
Xing Hu
Xin Zhou
David Lo
Taolue Chen
ALM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation"
24 / 24 papers shown
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
Hongchao Jiang
Yiming Chen
Yushi Cao
Hung-yi Lee
R. Tan
ELM
LRM
167
9
0
14 Jul 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALM
OffRL
436
50
0
04 Mar 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
599
68
0
03 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRL
AI4TS
LRM
ReLM
VLM
1.2K
5,342
0
22 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
1.1K
287
0
25 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
479
80
0
07 Nov 2024
CodeJudge: Evaluating Code Generation with Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Weixi Tong
Tianyi Zhang
ELM
ALM
143
44
0
03 Oct 2024
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
335
828
0
18 Sep 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
603
371
0
22 Jun 2024
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
International Conference on Artificial Intelligence Testing (ICAIT), 2024
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
ALM
ELM
189
31
0
18 Jun 2024
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
234
8
0
26 Apr 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
644
193
0
03 Apr 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Yiming Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
416
1,348
0
25 Jan 2024
Efficient Memory Management for Large Language Model Serving with PagedAttention
Symposium on Operating Systems Principles (SOSP), 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
1.6K
4,229
0
12 Sep 2023
Large Language Models for Software Engineering: A Systematic Literature Review
ACM Transactions on Software Engineering and Methodology (TOSEM), 2023
Xinying Hou
Yanjie Zhao
Yue Liu
Zhou Yang
Kailong Wang
Li Li
Xiapu Luo
David Lo
John C. Grundy
Haoyu Wang
358
756
0
21 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Neural Information Processing Systems (NeurIPS), 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
3.2K
6,617
0
09 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Neural Information Processing Systems (NeurIPS), 2023
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
1.1K
1,396
0
02 May 2023
ICE-Score: Instructing Large Language Models to Evaluate Code
Findings (Findings), 2023
Terry Yue Zhuo
ELM
ALM
328
65
0
27 Apr 2023
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shuyan Zhou
Uri Alon
Sumit Agarwal
Graham Neubig
ELM
ALM
257
151
0
10 Feb 2023
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators
Expert systems with applications (ESWA), 2022
Pietro Liguori
Cristina Improta
R. Natella
B. Cukic
Domenico Cotroneo
ELM
515
26
0
12 Dec 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.3K
14,608
0
28 Jan 2022
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
2.1K
7,722
0
07 Jul 2021
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren
Daya Guo
Shuai Lu
Long Zhou
Shujie Liu
Duyu Tang
Neel Sundaresan
M. Zhou
Ambrosio Blanco
Shuai Ma
ELM
454
738
0
22 Sep 2020
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
1.2K
19,884
0
06 Feb 2015
1