ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,505 papers shown
T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval
T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval
Dong Li
Yichen Niu
YIng Ai
Xiang Zou
Biqing Qi
Jianxing Liu
133
1
0
03 Aug 2025
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
Rubin Wei
Jiaqi Cao
Jiarui Wang
Jushi Kai
Qipeng Guo
Bowen Zhou
Zhouhan Lin
RALM
274
0
0
03 Aug 2025
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuanteng Chen
Yuantian Shao
Peisong Wang
Jian Cheng
MoE
162
2
0
03 Aug 2025
Importance Sampling is All You Need: Predict LLM's performance on new benchmark by reusing existing benchmark
Importance Sampling is All You Need: Predict LLM's performance on new benchmark by reusing existing benchmark
Junjie Shi
Wei Ma
Shi Ying
Lingxiao Jiang
Yang Liu
Bo Du
ALM
156
1
0
02 Aug 2025
How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective
How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective
Maáyan Armony
Albert Meroño-Peñuela
Gerard Canal
LRM
97
1
0
02 Aug 2025
TreeDiff: AST-Guided Code Generation with Diffusion LLMs
TreeDiff: AST-Guided Code Generation with Diffusion LLMs
Yiming Zeng
Jinghan Cao
Zexin Li
Yiming Chen
Tao Ren
Dawei Xiang
Xidong Wu
Shangqian Gao
Tingting Yu
Tingting Yu
DiffM
209
4
0
02 Aug 2025
Categorical Construction of Logically Verifiable Neural Architectures
Categorical Construction of Logically Verifiable Neural Architectures
Logan Nye
NAI
116
0
0
02 Aug 2025
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Sajana Weerawardhena
Paul Kassianik
Blaine Nelson
Baturay Saglam
Anu Vellore
...
Dhruv Kedia
Kojin Oshiba
Zhouran Yang
Yaron Singer
Amin Karbasi
ALMELM
185
4
0
01 Aug 2025
R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge
R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge
Yeonjun In
Wonjoong Kim
S. Park
Chanyoung Park
LRM
136
0
0
01 Aug 2025
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
Jinsong Li
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
DiffM
171
13
0
01 Aug 2025
Oedipus and the Sphinx: Benchmarking and Improving Visual Language Models for Complex Graphic Reasoning
Oedipus and the Sphinx: Benchmarking and Improving Visual Language Models for Complex Graphic Reasoning
Jianyi Zhang
Xu Ji
Ziyin Zhou
Yuchen Zhou
Shubo Shi
Haoyu Wu
Zhen Li
Shizhao Liu
ReLMCoGeLRMVLM
158
1
0
01 Aug 2025
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Yihong Dong
Xue Jiang
Yongding Tao
Huanyu Liu
Kechi Zhang
...
Binhua Li
Zhi Jin
Fei Huang
Y. Li
Ge Li
LRM
369
18
0
31 Jul 2025
AutoBridge: Automating Smart Device Integration with Centralized Platform
AutoBridge: Automating Smart Device Integration with Centralized Platform
Siyuan Liu
Zhice Yang
Huangxun Chen
155
0
0
31 Jul 2025
DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System
DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System
Hui Yi Leong
Yuqing Wu
173
0
0
31 Jul 2025
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Zunhai Su
Qingyuan Li
Hao Zhang
Weihao Ye
Qibo Xue
YuLei Qian
Yuchen Xie
Ngai Wong
Kehong Yuan
MoE
281
3
0
31 Jul 2025
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
Ishani Mondal
Meera Bharadwaj
Ayush Roy
Aparna Garimella
Jordan L. Boyd-Graber
KELM
247
0
0
30 Jul 2025
IFEvalCode: Controlled Code Generation
IFEvalCode: Controlled Code Generation
J. Yang
Wei Emma Zhang
Shukai Liu
Linzheng Chai
Y. Tan
...
Wangchunshu Zhou
Guanglin Niu
Zhoujun Li
Binyuan Hui
Junyang Lin
ALM
239
3
0
30 Jul 2025
GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries
GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries
Nuno Fachada
Daniel Fernandes
Carlos M. Fernandes
Bruno D. Ferreira-Saraiva
J. Matos-Carvalho
ALMLM&MAELM
183
3
0
30 Jul 2025
On LLM-Assisted Generation of Smart Contracts from Business Processes
On LLM-Assisted Generation of Smart Contracts from Business Processes
Fabian Stiehle
Hans Weytjens
Ingo Weber
184
0
0
30 Jul 2025
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
Q. Guo
Wei Xie
Xiaofang Cai
Enze Wang
Shuoyoucheng Ma
Kai Chen
Xiaofeng Wang
Baosheng Wang
Xiaofeng Wang
Baosheng Wang
ELM
191
0
0
30 Jul 2025
From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications
From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications
Cameron S. Movassaghi
Amanda Momenzadeh
Jesse G. Meyer
OffRL
66
1
0
30 Jul 2025
UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases
UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases
Raj Vardhan Tomar
Preslav Nakov
Yuxia Wang
LRM
248
3
0
29 Jul 2025
ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge
ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge
Zihan Zhao
B. Chen
Ziping Wan
Lu Chen
Xuanze Lin
...
Huayang Wang
Zhongyang Dai
Liyang Wen
Xin Chen
Kai Yu
LRMAI4CE
171
4
0
29 Jul 2025
Enhancing Project-Specific Code Completion by Inferring Internal API Information
Enhancing Project-Specific Code Completion by Inferring Internal API InformationIEEE Transactions on Software Engineering (TSE), 2025
Le Deng
Xiaoxue Ren
Chao Ni
Ming Liang
David Lo
Zhongxin Liu
178
6
0
28 Jul 2025
FMimic: Foundation Models are Fine-grained Action Learners from Human Videos
FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025
Guangyan Chen
Meiling Wang
Te Cui
Yao Mu
Haoyang Lu
...
Mengxiao Hu
Tianxing Zhou
M. Fu
Yi Yang
Yufeng Yue
LM&RoVLM
158
5
0
28 Jul 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
267
3
0
28 Jul 2025
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
Honghua Dong
Jiacheng Yang
Xun Deng
Yuhe Jiang
Gennady Pekhimenko
Fan Long
X. Si
209
2
0
28 Jul 2025
Kimi K2: Open Agentic Intelligence
Kimi K2: Open Agentic Intelligence
Kimi Team
Yifan Bai
Yiping Bao
Guanduo Chen
Jiahao Chen
...
Qifeng Teng
Chensi Wang
Dinglu Wang
Feng Wang
Haiming Wang
MoEVLMLRM
182
84
0
28 Jul 2025
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
Yining Huang
Bin Li
Keke Tang
Meilian Chen
MoELRM
255
2
0
28 Jul 2025
When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions
When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions
Maya Larbi
Amal Akli
Mike Papadakis
Rihab Bouyousfi
Maxime Cordy
Federica Sarro
Yves Le Traon
208
2
0
27 Jul 2025
CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation
CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation
Zhanhang Xiong
Dongxia Wang
Yuekang Li
Xinyuan An
Wenhai Wang
143
0
0
26 Jul 2025
The Impact of Fine-tuning Large Language Models on Automated Program Repair
The Impact of Fine-tuning Large Language Models on Automated Program Repair
Roman Macháček
Anastasiia Grishina
Max Hort
Leon Moonen
155
1
0
26 Jul 2025
Flora: Effortless Context Construction to Arbitrary Length and Scale
Flora: Effortless Context Construction to Arbitrary Length and Scale
Tianxiang Chen
Zhentao Tan
Xiaofan Bo
Yue Wu
Tao Gong
Qi Chu
Jieping Ye
Nenghai Yu
CLLLRM
253
1
0
26 Jul 2025
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
Muntasir Wahed
Xiaona Zhou
Kiet A. Nguyen
Tianjiao Yu
Nirav Diwan
Gang Wang
Dilek Hakkani-Tür
Ismini Lourentzou
AAML
171
1
0
25 Jul 2025
PennyCoder: Efficient Domain-Specific LLMs for PennyLane-Based Quantum Code Generation
PennyCoder: Efficient Domain-Specific LLMs for PennyLane-Based Quantum Code Generation
Abdul Basit
Minghao Shao
Muhammad Haider Asif
Nouhaila Innan
Muhammad Kashif
Alberto Marchisio
Muhammad Shafique
MQ
160
2
0
25 Jul 2025
PurpCode: Reasoning for Safer Code Generation
PurpCode: Reasoning for Safer Code Generation
Jiawei Liu
Nirav Diwan
Zhe Wang
Haoyu Zhai
Xiaona Zhou
...
Hadjer Benkraouda
Yuxiang Wei
Lingming Zhang
Ismini Lourentzou
Gang Wang
SILMLRMELM
447
7
0
25 Jul 2025
Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems
Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch ProblemsIEEE Transactions on Smart Grid (IEEE Trans. Smart Grid), 2025
Xu Yang
Chenhui Lin
Yue Yang
Qi Wang
Haotian Liu
Haizhou Hua
Wenchuan Wu
221
3
0
25 Jul 2025
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback
Qiushi Sun
Jinyang Gong
Lei Li
Qipeng Guo
Fei Yuan
SyDa
154
2
0
25 Jul 2025
Learning neuro-symbolic convergent term rewriting systems
Learning neuro-symbolic convergent term rewriting systems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
NAI
126
0
0
25 Jul 2025
MemoCoder: Automated Function Synthesis using LLM-Supported Agents
MemoCoder: Automated Function Synthesis using LLM-Supported Agents
Yiping Jia
Zhen Ming Jiang
Shayan Noei
Ying Zou
LLMAGKELM
220
0
0
24 Jul 2025
Technical Report of TeleChat2, TeleChat2.5 and T1
Technical Report of TeleChat2, TeleChat2.5 and T1
Zihan Wang
Xinzhang Liu
Yitong Yao
Chao Wang
Yu Zhao
...
Bingkai Yang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AI4TSLRM
428
6
0
24 Jul 2025
Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
Haomin Qi
Zihan Dai
Chengbo Huang
167
1
0
24 Jul 2025
Automated Code Review Using Large Language Models with Symbolic Reasoning
Automated Code Review Using Large Language Models with Symbolic ReasoningInternational Service Availability Symposium (ISAS), 2025
Busra Icoz
Goksel Biricik
LRM
160
0
0
24 Jul 2025
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
Hao Li
Lijun Li
Zhenghao Lu
Xianyi Wei
Rui Li
Jing Shao
Lei Sha
376
11
0
24 Jul 2025
AccessGuru: Leveraging LLMs to Detect and Correct Web Accessibility Violations in HTML Code
AccessGuru: Leveraging LLMs to Detect and Correct Web Accessibility Violations in HTML Code
Nadeen Fathallah
Daniel Hernández
Steffen Staab
3DVVLM
147
2
0
24 Jul 2025
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Feng Hong
Geng Yu
Yushi Ye
Haicheng Huang
Huangjie Zheng
Ya Zhang
Yanfeng Wang
Jiangchao Yao
189
13
0
24 Jul 2025
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation
Shiyuan Li
Yixin Liu
Qingsong Wen
Chengqi Zhang
Shirui Pan
341
16
0
24 Jul 2025
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
Yu Li
Zhuoshi Pan
Honglin Lin
Mengyuan Sun
Conghui He
Lijun Wu
LRM
148
7
0
23 Jul 2025
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
Zhuokun Chen
Zeren Chen
Jiahao He
Lu Sheng
Zhuliang Yu
Jianfei Cai
Bohan Zhuang
LRM
416
2
0
23 Jul 2025
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian
Jiapeng Wang
Qian Zhao
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Jiaxin Mao
Wayne Xin Zhao
Zhiqiang Zhang
Jun Zhou
MoMeCLL
264
6
0
23 Jul 2025
Previous
123...151617...899091
Next
Page 16 of 91
Pageof 91