ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,461 papers shown
Title
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
80
0
0
28 Nov 2025
TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation
TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation
Henrijs Princis
Arindam Sharma
Cristina David
8
0
0
27 Nov 2025
PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration
PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration
Junfei Zhan
Haoxun Shen
Zheng Lin
Tengjiao He
16
0
0
27 Nov 2025
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Daniel Agyei Asante
Md Mokarram Chowdhury
Yang Li
16
0
0
27 Nov 2025
BRIDGE: Building Representations In Domain Guided Program Verification
BRIDGE: Building Representations In Domain Guided Program Verification
Robert Joseph George
Carson Eisenach
Udaya Ghai
Dominique C. Perrault-Joncas
A. Anandkumar
Dean Phillips Foster
ALMLRM
345
0
0
26 Nov 2025
From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models
From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models
Hengyu Fu
Baihe Huang
Virginia Adams
Charles Wang
Venkat Srinivasan
Jiantao Jiao
130
0
0
26 Nov 2025
Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios
Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios
Luohe Shi
Zuchao Li
Lefei Zhang
Baoyuan Qi
Guoming Liu
Hai Zhao
AI4TS
156
0
0
25 Nov 2025
R3A: Reliable RTL Repair Framework with Multi-Agent Fault Localization and Stochastic Tree-of-Thoughts Patch Generation
R3A: Reliable RTL Repair Framework with Multi-Agent Fault Localization and Stochastic Tree-of-Thoughts Patch Generation
Zizhang Luo
Fan Cui
Kexing Zhou
Runlin Guo
Mile Xia
Hongyuan Hou
Yun Liang
3DVKELM
230
0
0
25 Nov 2025
CLIMATEAGENT: Multi-Agent Orchestration for Complex Climate Data Science Workflows
CLIMATEAGENT: Multi-Agent Orchestration for Complex Climate Data Science Workflows
Hyeonjae Kim
Chenyue Li
Wen Deng
Mengxi Jin
Wen Huang
Mengqian Lu
Binhang Yuan
AI4CE
263
0
0
25 Nov 2025
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
Yuanyuan Lin
Xiangyu Ouyang
Teng Zhang
Kaixin Sui
120
0
0
25 Nov 2025
NNGPT: Rethinking AutoML with Large Language Models
NNGPT: Rethinking AutoML with Large Language Models
Roman Kochnev
Waleed Khalid
Tolgay Atinc Uzun
X. Zhang
Yashkumar Sanjaybhai Dhameliya
...
Chandini Vysyaraju
Raghuvir Duvvuri
Avi Goyal
D. Ignatov
Radu Timofte
LM&MALRM
163
4
0
25 Nov 2025
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
Yuanhao Li
Mingshan Liu
Hongbo Wang
Yiding Zhang
Yifei Ma
Wei Tan
AI4TSKELMLRMAI4CE
366
0
0
25 Nov 2025
Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning
Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning
Panayiotis Danassis
Naman Goel
29
0
0
25 Nov 2025
Supporting Students in Navigating LLM-Generated Insecure Code
Supporting Students in Navigating LLM-Generated Insecure Code
Jaehwan Park
Kyungchan Lim
Seonhye Park
Doowon Kim
60
0
0
25 Nov 2025
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Wentao Hu
Mingkuan Zhao
Shuangyong Song
Xiaoyan Zhu
Xin Lai
Jiayin Wang
87
1
0
25 Nov 2025
CDLM: Consistency Diffusion Language Models For Faster Sampling
CDLM: Consistency Diffusion Language Models For Faster Sampling
Minseo Kim
Chenfeng Xu
Coleman Hooper
Harman Singh
Ben Athiwaratkun
Ce Zhang
Kurt Keutzer
Amir Gholami
132
0
0
24 Nov 2025
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
Abhijeet Pathak
Suvadra Barua
Dinesh Gudimetla
Rupam Patir
Jiawei Guo
Hongxin Hu
Haipeng Cai
ELM
84
0
0
24 Nov 2025
Agint: Agentic Graph Compilation for Software Engineering Agents
Agint: Agentic Graph Compilation for Software Engineering Agents
Abhi Chivukula
Jay Somasundaram
Vijay Somasundaram
AIFin
264
0
0
24 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
122
0
0
24 Nov 2025
Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds
Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds
Mohammad Nour Al Awad
Sergey Ivanov
Olga Tikhonova
64
0
0
24 Nov 2025
Learning Robust Social Strategies with Large Language Models
Learning Robust Social Strategies with Large Language Models
Dereck Piche
Mohammed Muqeeth
Milad Aghajohari
Juan Agustin Duque
Michael Noukhovitch
Aaron Courville
120
0
0
24 Nov 2025
SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning
SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning
David Jiahao Fu
Aryan Gupta
Aaron Councilman
David Grove
Yu-Xiong Wang
Vikram S. Adve
LRM
84
0
0
24 Nov 2025
A Multimodal Conversational Agent for Tabular Data Analysis
A Multimodal Conversational Agent for Tabular Data Analysis
Mohammad Nour Al Awad
Sergey Ivanov
Olga Tikhonova
Ivan Khodnenko
56
0
0
23 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAGMUMoELRM
328
0
0
23 Nov 2025
Evaluating perturbation robustnessof generative systems that use COBOL code inputs
Evaluating perturbation robustnessof generative systems that use COBOL code inputs
Samuel Ackerman
Wesam Ibraheem
Orna Raz
Marcel Zalmanovici
AAML
113
0
0
23 Nov 2025
Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
Kevin Lee
Russell Spiewak
James Walsh
LRM
45
0
0
23 Nov 2025
Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation
Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation
Wei Dong
Han Zhou
J. Lin
Jun Chen
62
0
0
23 Nov 2025
$A^2Flow:$ Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators
A2Flow:A^2Flow:A2Flow: Automating Agentic Workflow Generation via Self-Adaptive Abstraction Operators
Mingming Zhao
Xiaokang Wei
Yuanqi Shao
Kaiwen Zhou
Lin Yang
Siwei Rao
Junhui Zhan
Zhitang Chen
62
0
0
23 Nov 2025
Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost
Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost
Haojun Xia
Xiaoxia Wu
Jisen Li
Robert Wu
Junxiong Wang
...
Donglin Zhuang
Zhongzhu Zhou
Ben Athiwaratkun
Zhen Zheng
Shuaiwen Leon Song
MQ
112
0
0
23 Nov 2025
FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
Jin Cui
Boran Zhao
Jiajun Xu
Jiaqi Guo
Shuo Guan
Pengju Ren
OOD
97
0
0
22 Nov 2025
Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East
Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East
Lara Hassan
Mohamed ElZeftawy
Abdulrahman Mahmoud
24
0
0
21 Nov 2025
E$^3$-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
E3^33-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
Tao Yuan
Haoli Bai
Yinfei Pan
Xuyang Cao
Tianyu Zhang
Lu Hou
Ting Hu
Xianzhi Yu
VLM
163
0
0
21 Nov 2025
Asking LLMs to Verify First is Almost Free Lunch
Asking LLMs to Verify First is Almost Free Lunch
Shiguang Wu
Quanming Yao
ReLMLRM
112
0
0
21 Nov 2025
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
Zhen Wang
Zhifeng Gao
Guolin Ke
OffRLLRM
233
0
0
21 Nov 2025
Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective
Yang Yu
84
1
0
20 Nov 2025
PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization
Huseein Jawad
Nicolas Brunel
AAML
120
0
0
20 Nov 2025
CARE: Turning LLMs Into Causal Reasoning Expert
Juncheng Dong
Yiling Liu
Ahmed Aloui
Vahid Tarokh
David Carlson
136
0
0
20 Nov 2025
InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution
Kefan Li
Mengfei Wang
Hengzhi Zhang
Zhichao Li
Yuan Yuan
Mu Li
X. Gao
Hailong Sun
Chunming Hu
Weifeng Lv
112
0
0
20 Nov 2025
Multi-Agent Code Verification via Information Theory
Multi-Agent Code Verification via Information Theory
Shreshth Rajan
49
0
0
20 Nov 2025
NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
Hossain Shaikh Saadi
Faria Alam
Mario Sanz-Guerrero
Minh Duc Bui
Manuel Mager
Katharina von der Wense
53
0
0
20 Nov 2025
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization
Rahul Thomas
Arka Pal
88
0
0
19 Nov 2025
Effective Code Membership Inference for Code Completion Models via Adversarial Prompts
Effective Code Membership Inference for Code Completion Models via Adversarial Prompts
Yuan Jiang
Zehao Li
Shan Huang
Christoph Treude
Xiaohong Su
Tiantian Wang
AAML
221
0
0
19 Nov 2025
Parameter Importance-Driven Continual Learning for Foundation Models
Parameter Importance-Driven Continual Learning for Foundation Models
LingXiang Wang
Hainan Zhang
Zhiming Zheng
KELMCLL
414
0
0
19 Nov 2025
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
Kexin Chu
Dawei Xiang
Zixu Shen
Yiwei Yang
Zecheng Liu
Wei Zhang
MoEMQ
347
1
0
19 Nov 2025
MermaidSeqBench: An Evaluation Benchmark for LLM-to-Mermaid Sequence Diagram Generation
MermaidSeqBench: An Evaluation Benchmark for LLM-to-Mermaid Sequence Diagram Generation
Basel Shbita
Farhan Ahmed
Chad DeLuca
33
0
0
18 Nov 2025
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
Sushant Mehta
126
0
0
18 Nov 2025
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?
Chunqiu Steven Xia
Zhe Wang
Yan Yang
Yuxiang Wei
Lingming Zhang
LLMAG
315
2
0
17 Nov 2025
KForge: Program Synthesis for Diverse AI Hardware Accelerators
KForge: Program Synthesis for Diverse AI Hardware Accelerators
Taras Sereda
Tom St. John
Burak Bartan
Natalie Serrino
Sachin Katti
Zain Asgar
140
0
0
17 Nov 2025
Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series
Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series
Onur Vural
S. M. Hamdi
S. F. Boubrahimi
AI4TS
80
0
0
17 Nov 2025
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
Oron Anschel
Alon Shoshan
Adam Botach
Shunit Haviv Hakimi
Asaf Gendler
Emanuel Ben-Baruch
Nadav Bhonker
Igor Kviatkovsky
Manoj Aggarwal
Gérard Medioni
ALM
344
1
0
16 Nov 2025
1234...888990
Next