ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19502
  4. Cited By
CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation

CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation

26 May 2025
Guang Yang
Yu Zhou
Xiang Chen
Wei-Shi Zheng
Xing Hu
Xin Zhou
David Lo
Taolue Chen
    ALMLRM
ArXiv (abs)PDFHTML

Papers citing "CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation"

24 / 24 papers shown
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
Hongchao Jiang
Yiming Chen
Yushi Cao
Hung-yi Lee
R. Tan
ELMLRM
167
9
0
14 Jul 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALMOffRL
436
50
0
04 Mar 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
599
68
0
03 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRLAI4TSLRMReLMVLM
1.2K
5,342
0
22 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELMAILaw
1.1K
287
0
25 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
OpenCoder: The Open Cookbook for Top-Tier Code Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
479
80
0
07 Nov 2024
CodeJudge: Evaluating Code Generation with Large Language Models
CodeJudge: Evaluating Code Generation with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Weixi Tong
Tianyi Zhang
ELMALM
143
44
0
03 Oct 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
335
828
0
18 Sep 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
603
371
0
22 Jun 2024
Benchmarks and Metrics for Evaluations of Code Generation: A Critical
  Review
Benchmarks and Metrics for Evaluations of Code Generation: A Critical ReviewInternational Conference on Artificial Intelligence Testing (ICAIT), 2024
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
ALMELM
189
31
0
18 Jun 2024
On the Limitations of Embedding Based Methods for Measuring Functional
  Correctness for Code Generation
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
234
8
0
26 Apr 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
644
193
0
03 Apr 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
  Rise of Code Intelligence
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Yiming Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
416
1,348
0
25 Jan 2024
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttentionSymposium on Operating Systems Principles (SOSP), 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
1.6K
4,229
0
12 Sep 2023
Large Language Models for Software Engineering: A Systematic Literature
  Review
Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology (TOSEM), 2023
Xinying Hou
Yanjie Zhao
Yue Liu
Zhou Yang
Kailong Wang
Li Li
Xiapu Luo
David Lo
John C. Grundy
Haoyu Wang
358
756
0
21 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaNeural Information Processing Systems (NeurIPS), 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
3.2K
6,617
0
09 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationNeural Information Processing Systems (NeurIPS), 2023
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELMALM
1.1K
1,396
0
02 May 2023
ICE-Score: Instructing Large Language Models to Evaluate Code
ICE-Score: Instructing Large Language Models to Evaluate CodeFindings (Findings), 2023
Terry Yue Zhuo
ELMALM
328
65
0
27 Apr 2023
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
CodeBERTScore: Evaluating Code Generation with Pretrained Models of CodeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shuyan Zhou
Uri Alon
Sumit Agarwal
Graham Neubig
ELMALM
257
151
0
10 Feb 2023
Who Evaluates the Evaluators? On Automatic Metrics for Assessing
  AI-based Offensive Code Generators
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code GeneratorsExpert systems with applications (ESWA), 2022
Pietro Liguori
Cristina Improta
R. Natella
B. Cukic
Domenico Cotroneo
ELM
515
26
0
12 Dec 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.3K
14,608
0
28 Jan 2022
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
2.1K
7,722
0
07 Jul 2021
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren
Daya Guo
Shuai Lu
Long Zhou
Shujie Liu
Duyu Tang
Neel Sundaresan
M. Zhou
Ambrosio Blanco
Shuai Ma
ELM
454
738
0
22 Sep 2020
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
  ImageNet Classification
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
1.2K
19,884
0
06 Feb 2015
1