ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,451 papers shown
Title
Steering Pretrained Drafters during Speculative Decoding
Steering Pretrained Drafters during Speculative Decoding
Frédéric Berdoz
Peer Rheinboldt
Roger Wattenhofer
LLMSV
348
0
0
13 Nov 2025
Reasoning: From Reflection to Solution
Reasoning: From Reflection to Solution
Zixi Li
LRM
129
0
1
12 Nov 2025
LLM-GROP: Visually Grounded Robot Task and Motion Planning with Large Language Models
LLM-GROP: Visually Grounded Robot Task and Motion Planning with Large Language ModelsThe international journal of robotics research (IJRR), 2025
Xiaohan Zhang
Yan Ding
Yohei Hayamizu
Zainab Altaweel
Yifeng Zhu
Yuke Zhu
Peter Stone
Chris Paxton
Shiqi Zhang
LM&Ro
192
1
0
11 Nov 2025
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Yoonho Lee
Joseph Boen
Chelsea Finn
119
1
0
11 Nov 2025
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Zhaojian Yu
Kaiyue Feng
Yilun Zhao
Shilin He
Xiao-Ping Zhang
Arman Cohan
89
0
0
11 Nov 2025
The Online Patch Redundancy Eliminator (OPRE): A novel approach to online agnostic continual learning using dataset compression
The Online Patch Redundancy Eliminator (OPRE): A novel approach to online agnostic continual learning using dataset compression
Raphaël Bayle
Martial Mermillod
Robert M. French
CLL
157
0
0
11 Nov 2025
VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation
VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation
Arpan Phukan
Anupam Pandey
Deepjyoti Bodo
Asif Ekbal
LRM
113
0
0
11 Nov 2025
Analyzing Political Text at Scale with Online Tensor LDA
Analyzing Political Text at Scale with Online Tensor LDA
Sara Kangaslahti
Danny Ebanks
Jean Kossaifi
Anqi Liu
R. Alvarez
A. Anandkumar
76
0
0
11 Nov 2025
Procedural Knowledge Improves Agentic LLM Workflows
Procedural Knowledge Improves Agentic LLM Workflows
Vincent Hsiao
Mark Roberts
Leslie Smith
AIFin
347
0
0
10 Nov 2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
155
0
0
10 Nov 2025
MobileLLM-Pro Technical Report
MobileLLM-Pro Technical Report
Patrick Huber
Ernie Chang
Wei Wen
Igor Fedorov
Tarek Elgamal
...
Vikas Chandra
Ahmed Aly
Anuj Kumar
Raghuraman Krishnamoorthi
Adithya Sagar
76
0
0
10 Nov 2025
SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction
SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction
Wuyang Zhang
Chenkai Zhang
Zhen Luo
Jianming Ma
Wangming Yuan
Chuqiao Gu
Chenwei Feng
52
0
0
10 Nov 2025
LLM For Loop Invariant Generation and Fixing: How Far Are We?
LLM For Loop Invariant Generation and Fixing: How Far Are We?
Mostafijur Rahman Akhond
Saikat Chakraborty
Gias Uddin
141
1
0
09 Nov 2025
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Xiaonan Luo
Yue Huang
Ping He
Xiangliang Zhang
68
0
0
09 Nov 2025
FLEX: Continuous Agent Evolution via Forward Learning from Experience
FLEX: Continuous Agent Evolution via Forward Learning from Experience
Zhicheng Cai
Xinyuan Guo
Yu Pei
Jiangtao Feng
Jiangjie Chen
Ya Zhang
Wei-Ying Ma
Mingxuan Wang
Hao Zhou
Hao Zhou
CLLLLMAGLRM
242
3
0
09 Nov 2025
Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models
Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models
Mayank Saini
Arit Kumar Bishwas
MoE
86
0
0
09 Nov 2025
Route Experts by Sequence, not by Token
Route Experts by Sequence, not by Token
Tiansheng Wen
Y. Wang
Aosong Feng
Long Ma
Xinyang Liu
Y. Wang
Lixuan Guo
Bo Chen
Stefanie Jegelka
Chenyu You
MoE
130
0
0
09 Nov 2025
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
Amit Levi
Raz Lapid
Rom Himelstein
Yaniv Nemcovsky
Ravid Shwartz Ziv
A. Mendelson
MQ
85
0
0
09 Nov 2025
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Valentin Noël
60
0
0
08 Nov 2025
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
Jeffrey Ma
Milad Hashemi
Amir Yazdanbakhsh
Kevin Swersky
Ofir Press
Enhui Li
Vijay Janapa Reddi
Parthasarathy Ranganathan
77
2
0
08 Nov 2025
Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement
Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement
Hiroaki Hayashi
Bo Pang
Wenting Zhao
Ye Liu
Akash Gokul
Srijan Bansal
Caiming Xiong
Semih Yavuz
Yingbo Zhou
LLMAGLM&RoLRM
264
0
0
08 Nov 2025
An Empirical Study of Reasoning Steps in Thinking Code LLMs
An Empirical Study of Reasoning Steps in Thinking Code LLMs
Haoran Xue
Gias Uddin
Song Wang
LRM
84
1
0
08 Nov 2025
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Jingxuan Xu
K. Deng
W. Li
Songwei Yu
Huaixi Tang
...
Zhaoxiang Zhang
Yuqun Zhang
H. Zhang
Bin Chen
Jiaheng Liu
ELM
284
1
0
07 Nov 2025
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Leak@kkk: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Hadi Reisizadeh
Jiajun Ruan
Yiwei Chen
Soumyadeep Pal
Sijia Liu
Mingyi Hong
MU
340
0
0
07 Nov 2025
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
S. Kim
S. Hong
Hojung Jung
Youngrok Park
Se-Young Yun
DiffMVLM
96
0
0
07 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
120
0
0
07 Nov 2025
Motif 2 12.7B technical report
Motif 2 12.7B technical report
Junghwan Lim
S. W. Lee
Dongseok Kim
Taehyun Kim
Eunhwan Park
...
Kungyu Lee
Dongpin Oh
Yeongjae Park
Bokki Ryu
Dongjoo Weon
72
0
0
07 Nov 2025
REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment
REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment
Priyanka Mudgal
HILM
212
0
0
06 Nov 2025
Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks
Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks
Amir Molzam Sharifloo
Maedeh Heydari
Parsa Kazerooni
Daniel Maninger
Mira Mezini
ALM
196
0
0
06 Nov 2025
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
Michael Menezes
Barbara Su
Xinze Feng
Yehya Farhat
Hamza Shili
Anastasios Kyrillidis
140
1
0
06 Nov 2025
Exploring the Feasibility of End-to-End Large Language Model as a Compiler
Exploring the Feasibility of End-to-End Large Language Model as a Compiler
H. Zhang
Shihao Gao
Yang Liu
Mingjie Xing
Yanjun Wu
Chen Zhao
100
0
0
06 Nov 2025
From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting
From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting
Cyril Vallez
Alexander Sternfeld
Andrei Kucharavy
Ljiljana Dolamic
ELM
157
0
0
06 Nov 2025
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
Vinaik Chhetri
A.B. Siddique
Umar Farooq
KELM
88
0
0
05 Nov 2025
Secure Code Generation at Scale with Reflexion
Secure Code Generation at Scale with Reflexion
Arup Datta
Ahmed Aljohani
Hyunsook Do
ELM
88
0
0
05 Nov 2025
Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control
Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control
Rewida Ali
C. C. Beltran-Hernandez
Weiwei Wan
Kensuke Harada
OffRL
52
0
0
05 Nov 2025
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Liuhao Lin
Ke Li
Zihan Xu
Yuchen Shi
Yulei Qin
Y. Zhang
Xing Sun
Rongrong Ji
144
1
0
04 Nov 2025
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
Jiedong Jiang
Wanyi He
Yuefeng Wang
Guoxiong Gao
Yongle Hu
...
Nailing Guan
Peihao Wu
Chunbo Dai
Liang Xiao
Bin Dong
AIMatELMLRM
326
0
0
04 Nov 2025
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Kevin Qinghong Lin
Y. Zheng
Hangyu Ran
Dantong Zhu
Dongxing Mao
Linjie Li
Philip Torr
Alex Jinpeng Wang
68
1
0
04 Nov 2025
PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts
PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts
Vivi Andersson
Sofia Bobadilla
Harald Hobbelhagen
Martin Monperrus
148
1
0
04 Nov 2025
Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models
Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models
Sanghyun Lee
Seungryong Kim
Jongho Park
D. Park
43
1
0
04 Nov 2025
Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression
Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression
Mingyu Sung
Suhwan Im
Daeho Bang
Il-Min Kim
Sangseok Yun
Jae-Mo Kang
64
0
0
03 Nov 2025
EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering
EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering
Ayesha Gull
Muhammad Usman Safder
Rania Elbadry
Preslav Nakov
Zhuohan Xie
ELMLRM
192
0
0
03 Nov 2025
Detecting Vulnerabilities from Issue Reports for Internet-of-Things
Detecting Vulnerabilities from Issue Reports for Internet-of-Things
Sogol Masoumzadeh
52
0
0
03 Nov 2025
TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding
TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding
Aditya Sridhar
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
72
0
0
03 Nov 2025
Context-Guided Decompilation: A Step Towards Re-executability
Context-Guided Decompilation: A Step Towards Re-executability
Xiaohan Wang
Yuxin Hu
Kevin Leach
76
0
0
03 Nov 2025
SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring
SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring
Jiawei Jin
Yingxin Su
Xiaotong Zhu
VLM
68
0
0
03 Nov 2025
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation
İbrahim Ethem Deveci
Duygu Ataman
ReLMALMELMLRM
187
0
0
03 Nov 2025
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
Jie JW Wu
Ayanda Patrick Herlihy
Ahmad Saleem Mirza
Ali Afoud
Fatemeh H. Fard
OffRL
52
0
0
02 Nov 2025
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
Yujian Liu
Jiabao Ji
Yang Zhang
Wenbo Guo
Tommi Jaakkola
Shiyu Chang
100
0
0
02 Nov 2025
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
Bosi Wen
Y. Niu
C. Wang
Pei Ke
Xiaoying Ling
Y. Zhang
A. Zeng
Hongning Wang
Shiyu Huang
ALM
136
0
0
02 Nov 2025
Previous
12345...888990
Next