ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code
v1v2 (latest)

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 4,509 papers shown
DINGO: Constrained Inference for Diffusion LLMs
DINGO: Constrained Inference for Diffusion LLMs
Tarun Suresh
Debangshu Banerjee
Shubham Ugare
Sasa Misailovic
Gagandeep Singh
DiffM
198
3
0
29 May 2025
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du
Luu Tuan Tuan
Yue Liu
Yuhao Qing
Dong Huang
Xinyi He
Qian Liu
Zejun Ma
See-Kiong Ng
330
6
0
29 May 2025
Self-Correcting Code Generation Using Small Language Models
Self-Correcting Code Generation Using Small Language Models
Jeonghun Cho
Deokhyung Kang
Hyounghun Kim
Gary Lee
KELM3DVLRM
274
0
0
29 May 2025
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL
Yichen Feng
Zhangchen Xu
Fengqing Jiang
Yuetai Li
Bhaskar Ramasubramanian
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
ReLMLRM
158
8
0
29 May 2025
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Wendong Xu
Jing Xiong
Chenyang Zhao
Qiujiang Chen
Haoran Wang
...
Hongxia Yang
Bei Yu
Lingpeng Kong
Q. Gu
Ngai Wong
LRM
193
2
0
29 May 2025
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
William Merrill
Shane Arora
Dirk Groeneveld
Hannaneh Hajishirzi
462
5
0
29 May 2025
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics
Atharva Naik
Darsh Agrawal
Darsh Agrawal
Yash Mathur
Manav Kapadnis
Yuwei An
Clayton Marr
Carolyn Rose
David R. Mortensen
LRMELM
261
0
0
29 May 2025
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents
Tobias Lindenbauer
Georg Groh
Hinrich Schütze
211
1
0
29 May 2025
GenCAD-Self-Repairing: Feasibility Enhancement for 3D CAD Generation
GenCAD-Self-Repairing: Feasibility Enhancement for 3D CAD Generation
Chikaha Tsuji
Enrique Flores Medina
Harshit Gupta
Md Ferdous Alam
163
0
0
29 May 2025
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Guangtao Zeng
Maohao Shen
Delin Chen
Zhenting Qi
Subhro Das
...
David D. Cox
G. Wornell
Wei Lu
Zhang-Wei Hong
Chuang Gan
283
6
0
29 May 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang
Shiqian Su
Shi-Qi Liu
Xuan Dong
Yue Yu
...
Hao Li
Wenhai Wang
Yu Qiao
Xizhou Zhu
Jifeng Dai
OffRL
349
13
0
29 May 2025
Reverse Preference Optimization for Complex Instruction Following
Reverse Preference Optimization for Complex Instruction FollowingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiang Huang
Ting-En Lin
Feiteng Fang
Yuchuan Wu
Hangyu Li
Yuzhong Qu
Fei Huang
Yongbin Li
207
2
0
28 May 2025
Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling
Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling
Fanzeng Xia
Yidong Luo
Tinko Sebastian Bartels
Yaqi Xu
Tongxin Li
ReLMLRM
266
0
0
28 May 2025
HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding
HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding
Emmanuel Anaya Gonzalez
Raven Rothkopf
Sorin Lerner
Nadia Polikarpova
297
1
0
28 May 2025
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay
Andrew Zhu
Evan Osgood
Chris Callison-Burch
LLMAG
266
0
0
28 May 2025
Advancing Expert Specialization for Better MoE
Advancing Expert Specialization for Better MoE
Hongcan Guo
Haolang Lu
Guoshun Nan
Bolun Chu
Jialin Zhuang
...
Wenhao Che
Sicong Leng
Qimei Cui
Xudong Jiang
Xudong Jiang
MoEMoMe
390
9
0
28 May 2025
Text2Grad: Reinforcement Learning from Natural Language Feedback
Text2Grad: Reinforcement Learning from Natural Language Feedback
Hanyang Wang
Lu Wang
Chaoyun Zhang
Tianjun Mao
Si Qin
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
235
1
0
28 May 2025
Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge
Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge
Yupei Li
Shuaijie Shao
M. Milling
Björn Schuller
AI4MH
207
3
0
28 May 2025
Scaling Reasoning without Attention
Scaling Reasoning without Attention
Xueliang Zhao
Wei Wu
Lingpeng Kong
OffRLReLMLRMVLM
178
3
0
28 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
247
2
0
28 May 2025
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache ReuseEuropean Conference on Parallel Processing (Euro-Par), 2025
Tianyu Guo
Hande Dong
Yichong Leng
Feng Liu
Cheater Lin
Nong Xiao
X. Zhang
RALM
230
1
0
28 May 2025
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qingchen Yu
Zifan Zheng
Simin Niu
Shichao Song
Bo Tang
Feiyu Xiong
Zhiyu Li
ELMLRM
206
3
0
28 May 2025
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
LASER: Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
287
0
0
28 May 2025
SimuGen: Multi-modal Agentic Framework for Constructing Block Diagram-Based Simulation Models
SimuGen: Multi-modal Agentic Framework for Constructing Block Diagram-Based Simulation Models
Xinxing Ren
Qianbo Zang
Zekun Guo
LLMAG
216
5
0
28 May 2025
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
Junyan Zhang
Yubo Gao
Yibo Yan
Jia-Chen Gu
Zhaorui Hou
...
Qi Zheng
Song Dai
Yonghua Hei
Junzhuo Li
Xuming Hu
228
3
0
27 May 2025
Explaining Large Language Models with gSMILE
Explaining Large Language Models with gSMILE
Zeinab Dehghani
Mohammed Naveed Akram
Adil Khan
Mohammed Naveed Akram
Y. Papadopoulos
MILMLRM
570
0
0
27 May 2025
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
476
2
0
27 May 2025
Can LLMs Learn to Map the World from Local Descriptions?
Can LLMs Learn to Map the World from Local Descriptions?
Sirui Xia
Aili Chen
Xintao Wang
Tinghui Zhu
Yikai Zhang
Jiangjie Chen
Yanghua Xiao
232
2
0
27 May 2025
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits
Yeshwanth Venkatesha
Souvik Kundu
Priyadarshini Panda
166
7
0
27 May 2025
Test-Time Learning for Large Language Models
Test-Time Learning for Large Language Models
Jinwu Hu
Zhitian Zhang
Guohao Chen
Xutao Wen
Chao Shuai
Wei Luo
Bin Xiao
Yuanqing Li
Zhuliang Yu
440
13
0
27 May 2025
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari
Shivanshu Gupta
Fei Wang
Anshuman Chhabra
Muhao Chen
TDI
426
4
0
27 May 2025
Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities
Deconstructing Obfuscation: A four-dimensional framework for evaluating Large Language Models assembly code deobfuscation capabilities
Anton Tkachenko
Dmitrij Suskevic
Benjamin Adolphi
305
1
0
26 May 2025
Two Causally Related Needles in a Video Haystack
Two Causally Related Needles in a Video Haystack
Miaoyu Li
Qin Chao
Boyang Albert Li
CML
311
0
0
26 May 2025
Token-Importance Guided Direct Preference Optimization
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
273
0
0
26 May 2025
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
Yiqun Zhang
Hao Li
Chenxu Wang
L. Chen
Qiaosheng Zhang
...
Xinrun Wang
Jia Xu
Mengwei He
Xuming He
Shuyue Hu
412
16
0
26 May 2025
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
C. Gomes
Shaukat Ali
Paolo Arcaini
Andrea Arcuri
232
0
0
26 May 2025
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
Zhaowei Zhang
Minghua Yi
Minghua Yi
Mengmeng Wang
Fengshuo Bai
Zilong Zheng
Yipeng Kang
Yaodong Yang
306
1
0
26 May 2025
Lifelong Safety Alignment for Language Models
Lifelong Safety Alignment for Language Models
Haoyu Wang
Zeyu Qin
Yifei Zhao
C. Du
Min Lin
Xueqian Wang
Tianyu Pang
KELMCLL
296
6
0
26 May 2025
CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation
CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation
Guang Yang
Yu Zhou
Xiang Chen
Wei-Shi Zheng
Xing Hu
Xin Zhou
David Lo
Taolue Chen
ALMLRM
239
5
0
26 May 2025
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
Sahana Ramnath
Anurag Mudgil
Brihi Joshi
Skyler Hallinan
Xiang Ren
171
0
0
26 May 2025
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Junnan Liu
Hongwei Liu
Linchen Xiao
Shudong Liu
Taolin Zhang
Zihan Ma
Songyang Zhang
Kai Chen
LRM
375
3
0
26 May 2025
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Bingguang Hao
Xinjian Zhao
Zengzhuang Xu
Y. Wen
Yicheng Chen
...
D. Wang
Xiangyu Zhao
Jinjie Gu
Chenyi Zhuang
Ji Zhang
ReLMLRM
284
5
0
26 May 2025
Temporal Sampling for Forgotten Reasoning in LLMs
Temporal Sampling for Forgotten Reasoning in LLMs
Yuetai Li
Zhangchen Xu
Fengqing Jiang
Bhaskar Ramasubramanian
Luyao Niu
Bill Yuchen Lin
Xiang Yue
Radha Poovendran
CLLKELMLRM
307
10
0
26 May 2025
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Yandong Guan
Xilin Wang
Xingxi Ming
Jing Zhang
Dong Xu
Qian Yu
3DVLRM
220
0
0
26 May 2025
Conversational Lexicography: Querying Lexicographic Data on Knowledge Graphs with SPARQL through Natural Language
Conversational Lexicography: Querying Lexicographic Data on Knowledge Graphs with SPARQL through Natural Language
Kilian Sennrich
Sina Ahmadi
129
0
0
26 May 2025
ReChisel: Effective Automatic Chisel Code Generation by LLM with Reflection
ReChisel: Effective Automatic Chisel Code Generation by LLM with ReflectionDesign Automation Conference (DAC), 2025
Juxin Niu
Xiangfeng Liu
Dan Niu
Xi Wang
Zhe Jiang
Nan Guan
223
3
0
26 May 2025
Large Language Models for Planning: A Comprehensive and Systematic Survey
Large Language Models for Planning: A Comprehensive and Systematic Survey
Pengfei Cao
Tianyi Men
Wencan Liu
Jingwen Zhang
Xuzhao Li
Xixun Lin
Dianbo Sui
Yanan Cao
Kang Liu
Jun Zhao
LLMAGLM&RoOffRLELMLRM
458
19
0
26 May 2025
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Debargha Ganguly
Vikash Singh
Sreehari Sankar
Biyao Zhang
Xuecen Zhang
Srinivasan Iyengar
Xiaotian Han
Amit Sharma
Shivkumar Kalyanaraman
Vipin Chaudhary
315
3
0
26 May 2025
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Lachlan McGinness
Peter Baumgartner
ReLMLRMELM
508
1
0
26 May 2025
AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy
AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy
Sebastian Antony Joseph
Syed Murtaza Husain
Stella S. R. Offner
Stéphanie Juneau
Paul Torrey
Adam S. Bolton
Juan P. Farias
Niall Gaffney
Greg Durrett
Junyi Jessy Li
474
2
0
26 May 2025
Previous
123...202122...899091
Next
Page 21 of 91
Pageof 91