ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,509 papers shown
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
Junjie Xing
Yeye He
Mengyu Zhou
Haoyu Dong
Shi Han
Lingjiao Chen
Dongmei Zhang
S. Chaudhuri
H. V. Jagadish
LMTDELMLRM
279
5
0
05 Jun 2025
Sensory-Motor Control with Large Language Models via Iterative Policy Refinement
Sensory-Motor Control with Large Language Models via Iterative Policy Refinement
J. Carvalho
S. Nolfi
LM&Ro
367
0
0
05 Jun 2025
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao
Zhichang Guo
Dazhi Zhang
Dong Li
Runze Liu
Pengfei Li
Kai Tian
Biqing Qi
408
0
0
04 Jun 2025
From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
Viktor Hangya
Fabian Küch
Darina Gold
ELM
286
0
0
04 Jun 2025
Seed-Coder: Let the Code Model Curate Data for Itself
Seed-Coder: Let the Code Model Curate Data for Itself
ByteDance Seed
Yuyu Zhang
Jing Su
Yifan Sun
Chenguang Xi
...
Jiaze Chen
Siyao Liu
Kai Shen
Liang Xiang
Yonghui Wu
SyDaLRM
342
24
0
04 Jun 2025
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jun Rao
Zepeng Lin
Xuebo Liu
Xiaopeng Ke
Lian Lian
Dong Jin
Shengjun Cheng
Jun Yu
Min Zhang
249
9
0
04 Jun 2025
The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective
The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective
Jiin Kim
Byeongjun Shin
Jinha Chung
Minsoo Rhu
LLMAGLRM
356
12
0
04 Jun 2025
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
Zhepei Wei
Wei-Lin Chen
Xinyu Zhu
Yu Meng
OffRL
318
3
0
04 Jun 2025
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking
Neeva Oza
Ishaan Govil
Parul Gupta
Dinesh Khandelwal
Dinesh Garg
Parag Singla
238
1
0
04 Jun 2025
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems
Generating Automotive Code: Large Language Models for Software Development and Verification in Safety-Critical Systems
Sven Kirchner
Alois Knoll
176
6
0
04 Jun 2025
Understanding Gender Bias in AI-Generated Product Descriptions
Understanding Gender Bias in AI-Generated Product DescriptionsConference on Fairness, Accountability and Transparency (FAccT), 2025
Markelle Kelly
Mohammad Tahaei
Padhraic Smyth
Lauren Wilcox
229
26
0
03 Jun 2025
Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings
Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings
Alexandra González
Xavier Franch
David Lo
Luís Cruz
VLM
302
2
0
03 Jun 2025
Rethinking the effects of data contamination in Code Intelligence
Rethinking the effects of data contamination in Code Intelligence
Zhen Yang
Hongyi Lin
Yifan He
Jie Xu
Zeyu Sun
Shuo Liu
P. Wang
Zhongxing Yu
Qingyuan Liang
292
3
0
03 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Jiajun Sun
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Qi Zhang
Xuanjing Huang
ELM
308
3
0
03 Jun 2025
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
Siqi Chen
Xinyu Dong
Haolei Xu
Xingyu Wu
Fei Tang
...
Wenqi Zhang
Guiyang Hou
Yongliang Shen
Weiming Lu
Yueting Zhuang
VLM
229
4
0
03 Jun 2025
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching
Liang Yue
Yihong Tang
Kehai Chen
J. Tang
Min Zhang
LLMAG
289
0
0
03 Jun 2025
FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging
FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging
Zijian Li
Xiaocheng Feng
Huixin Liu
Yichong Huang
Ting Liu
Bing Qin
MoMe
353
0
0
03 Jun 2025
Simplifying Root Cause Analysis in Kubernetes with StateGraph and LLM
Simplifying Root Cause Analysis in Kubernetes with StateGraph and LLM
Yong Xiang
C. L. Philip Chen
Liyi Zeng
Wei Yin
Xin Liu
Hu Li
Wei Xu
186
3
0
03 Jun 2025
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao
Massimo Roberto Scamarcia
Javier Fernandez-Marques
Mohammad Naseri
Chong Shen Ng
...
Junyan Wang
Zheyuan Liu
Daniel J. Beutel
Lingjuan Lyu
Nicholas D. Lane
ALM
406
4
0
03 Jun 2025
EALG: Evolutionary Adversarial Generation of Language Model-Guided Generators for Combinatorial Optimization
EALG: Evolutionary Adversarial Generation of Language Model-Guided Generators for Combinatorial Optimization
Ruibo Duan
Yuxin Liu
Xinyao Dong
Chenglin Fan
291
3
0
03 Jun 2025
AI Scientists Fail Without Strong Implementation Capability
AI Scientists Fail Without Strong Implementation Capability
Minjun Zhu
Qiujie Xie
Yixuan Weng
Jian Wu
Zhen Lin
Linyi Yang
Yue Zhang
ELM
351
8
0
02 Jun 2025
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning
Edward Y. Chang
Zeyneb N. Kaya
Ethan Chang
LRM
384
0
0
02 Jun 2025
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
Wei Shen
Zhang Yaxiang
Minhui Huang
Mengfan Xu
Jiawei Zhang
Cong Shen
AI4CE
342
1
0
02 Jun 2025
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
Guangxin He
Yuan Cao
Yutong He
Tianyi Bai
Kun Yuan
Binhang Yuan
MQ
213
1
0
02 Jun 2025
Improving LLM-Generated Code Quality with GRPO
Improving LLM-Generated Code Quality with GRPO
Maxime Robeyns
Laurence Aitchison
ALM
171
2
0
02 Jun 2025
Earley-Driven Dynamic Pruning for Efficient Structured Decoding
Earley-Driven Dynamic Pruning for Efficient Structured Decoding
Xintong Sun
Chi Wei
Minghao Tian
Shiwen Ni
139
0
0
01 Jun 2025
Legal Compliance Evaluation of Smart Contracts Generated By Large Language Models
Legal Compliance Evaluation of Smart Contracts Generated By Large Language ModelsInternational Conference on Blockchain (ICB), 2025
Chanuka Wijayakoon
Hai Dong
H.M.N. Dilum Bandara
Z. Tari
Anurag Soin
AILawELM
152
2
0
01 Jun 2025
Mamba Drafters for Speculative Decoding
Mamba Drafters for Speculative Decoding
Daewon Choi
Seunghyuk Oh
Saket Dingliwal
Jihoon Tack
Kyuyoung Kim
...
Insu Han
Jinwoo Shin
Aram Galstyan
Shubham Katiyar
S. Bodapati
294
0
0
01 Jun 2025
Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method Generation
Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method Generation
Djaber Rouabhia
Ismail Hadjadj
187
5
0
01 Jun 2025
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation
Jovana Kondic
Pengyuan Li
D. Joshi
Zexue He
Shafiq Abedin
...
Assaf Arbelle
A. Oliva
Dan Gutfreund
Leonid Karlinsky
Rogerio Feris
140
0
0
31 May 2025
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
Monoshi Kumar Roy
Simin Chen
Benjamin Steenhoek
Jinjun Peng
Gail E. Kaiser
Baishakhi Ray
Wei Le
LRM
264
4
0
31 May 2025
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
Xinyi Wang
Lirong Gao
Haobo Wang
Yiming Zhang
Junbo Zhao
MoE
213
0
0
31 May 2025
SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation
SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation
Ivan Petrukha
Yana Kurliak
Nataliia Stulova
ALMELM
210
2
0
30 May 2025
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
Shuyao Xu
Cheng Peng
Jiangxuan Long
Weidi Xu
Wei Chu
Yuan Qi
LRM
204
2
0
30 May 2025
Tag-Evol: Achieving Efficient Instruction Evolving via Tag Injection
Tag-Evol: Achieving Efficient Instruction Evolving via Tag InjectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yixuan Wang
Shiqi Zhou
Chuanzhe Guo
Qingfu Zhu
3DV
164
0
0
30 May 2025
RAST: Reasoning Activation in LLMs via Small-model Transfer
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang
Xinyu Zhu
Zilin Xiao
Minhao Jiang
Yu Meng
Jiawei Han
OffRLReLMLRM
256
1
0
30 May 2025
Control-R: Towards controllable test-time scaling
Control-R: Towards controllable test-time scaling
Di Zhang
Weida Wang
Junxian Li
Xunzhi Wang
Jiatong Li
...
Peng Ye
Shufei Zhang
Xuming He
Yuqiang Li
Dongzhan Zhou
LRM
198
0
0
30 May 2025
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Zafir Stojanovski
Oliver Stanley
Joe Sharratt
Richard Jones
Abdulhakeem Adefioye
Jean Kaddour
Andreas Kopf
OffRLLRM
379
39
0
30 May 2025
Structure-Aware Fill-in-the-Middle Pretraining for Code
Structure-Aware Fill-in-the-Middle Pretraining for Code
Linyuan Gong
Alvin Cheung
Mostafa Elhoushi
Sida Wang
CLLAI4CE
149
0
0
30 May 2025
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
Heli Ben-Hamu
Itai Gat
Daniel Severo
Niklas Nolte
Brian Karrer
257
40
0
30 May 2025
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
Sana Ebrahimi
Mohsen Dehghankar
Abolfazl Asudeh
204
3
0
30 May 2025
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu
Jiaxuan Gao
Xujie Shen
Chen Zhu
Zhiyu Mei
...
Jun Mei
Jiashu Wang
Tongkai Yang
Binhang Yuan
Yi Wu
OffRLSyDaLRM
517
95
0
30 May 2025
Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models
Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models
Junyi Li
Hwee Tou Ng
OffRLHILMLRM
561
6
0
30 May 2025
HardTests: Synthesizing High-Quality Test Cases for LLM Coding
HardTests: Synthesizing High-Quality Test Cases for LLM Coding
Zhongmou He
Yee Man Choi
Kexun Zhang
Jiabao Ji
Junting Zhou
Dejia Xu
Ivan Bercovich
Aidan Zhang
Lei Li
322
7
0
30 May 2025
QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation
QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation
Y. Zhu
Di Huang
Hanqi Lyu
X. Zhang
Chongxiao Li
...
Rui Zhang
Zidong Du
Qi Guo
Xing Hu
Yihao Chen
OffRLLRM
413
3
0
30 May 2025
Can LLMs Reason Structurally? An Evaluation via the Lens of Data Structures
Can LLMs Reason Structurally? An Evaluation via the Lens of Data Structures
Yu He
Yingxi Li
Colin White
Ellen Vitercik
ELMLRM
234
1
0
29 May 2025
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Yilong Li
Chen Qian
Yu Xia
Ruijie Shi
Yufan Dang
...
Ye Tian
Xuantang Xiong
Lei Han
Zhiyuan Liu
Maosong Sun
LLMAG
316
1
0
29 May 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang
Shiqian Su
Shi-Qi Liu
Xuan Dong
Yue Yu
...
Hao Li
Wenhai Wang
Yu Qiao
Xizhou Zhu
Jifeng Dai
OffRL
349
13
0
29 May 2025
VERINA: Benchmarking Verifiable Code Generation
VERINA: Benchmarking Verifiable Code Generation
Zhe Ye
Zhengxu Yan
Jingxuan He
Timothe Kasriel
Kaiyu Yang
Dawn Song
234
7
0
29 May 2025
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
William Merrill
Shane Arora
Dirk Groeneveld
Hannaneh Hajishirzi
462
5
0
29 May 2025
Previous
123...192021...899091
Next
Page 20 of 91
Pageof 91