ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.03374
  4. Cited By
Evaluating Large Language Models Trained on Code

Evaluating Large Language Models Trained on Code

7 July 2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
Jared Kaplan
Harrison Edwards
Yura Burda
Nicholas Joseph
Greg Brockman
Alex Ray
Raul Puri
Gretchen Krueger
Michael Petrov
Heidy Khlaaf
Girish Sastry
Pamela Mishkin
Brooke Chan
Scott Gray
Nick Ryder
Mikhail Pavlov
Alethea Power
Lukasz Kaiser
Mohammad Bavarian
Clemens Winter
Philippe Tillet
F. Such
D. Cummings
Matthias Plappert
Fotios Chantzis
Elizabeth Barnes
Ariel Herbert-Voss
William H. Guss
Alex Nichol
Alex Paino
Nikolas Tezak
Jie Tang
Igor Babuschkin
S. Balaji
Shantanu Jain
William Saunders
Christopher Hesse
A. Carr
Jan Leike
Joshua Achiam
Vedant Misra
Evan Morikawa
Alec Radford
Matthew Knight
Miles Brundage
Mira Murati
Katie Mayer
Peter Welinder
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
    ELM
    ALM
ArXivPDFHTML

Papers citing "Evaluating Large Language Models Trained on Code"

50 / 857 papers shown
Title
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
Nan Xu
Xuezhe Ma
LRM
51
3
0
18 Oct 2024
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale
  Models
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
Qiaoyu Tang
Le Yu
Bowen Yu
Hongyu Lin
K. Lu
Y. Lu
Xianpei Han
Le Sun
MoMe
32
1
0
17 Oct 2024
Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay
Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay
Yuyang Chen
Kaiyan Zhao
Yiming Wang
Ming Yang
Jian Zhang
Xiaoguang Niu
30
1
0
16 Oct 2024
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Jaehong Yoon
Shoubin Yu
Vaidehi Patil
Huaxiu Yao
Mohit Bansal
76
15
0
16 Oct 2024
In-Context Learning Enables Robot Action Prediction in LLMs
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
112
1
0
16 Oct 2024
Mastering the Craft of Data Synthesis for CodeLLMs
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen
Philip Arthur
Qianyu Feng
Cong Duy Vu Hoang
Yu-Heng Hong
...
Mark Johnson
K. K.
Don Dharmasiri
Long Duong
Yuan-Fang Li
SyDa
58
1
0
16 Oct 2024
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks
Rudra Murthy
Prince Kumar
Praveen Venkateswaran
Danish Contractor
KELM
ALM
ELM
26
1
0
16 Oct 2024
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
45
1
0
15 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
66
5
0
15 Oct 2024
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Syeda Nahida Akter
Shrimai Prabhumoye
John Kamalu
S. Satheesh
Eric Nyberg
M. Patwary
M. Shoeybi
Bryan Catanzaro
LRM
SyDa
ReLM
98
1
0
15 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration
FLARE: Faithful Logic-Aided Reasoning and Exploration
Erik Arakelyan
Pasquale Minervini
Pat Verga
Patrick Lewis
Isabelle Augenstein
ReLM
LRM
61
2
0
14 Oct 2024
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Fangru Lin
Shaoguang Mao
Emanuele La Malfa
Valentin Hofmann
Adrian de Wynter
Jing Yao
Si-Qing Chen
Michael Wooldridge
Furu Wei
Furu Wei
51
2
0
14 Oct 2024
Denial-of-Service Poisoning Attacks against Large Language Models
Denial-of-Service Poisoning Attacks against Large Language Models
Kuofeng Gao
Tianyu Pang
Chao Du
Yong Yang
Shu-Tao Xia
Min-Bin Lin
SILM
AAML
56
4
0
14 Oct 2024
Balancing Continuous Pre-Training and Instruction Fine-Tuning:
  Optimizing Instruction-Following in LLMs
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs
Ishan Jindal
Chandana Badrinath
Pranjal Bharti
Lakkidi Vinay
Sachin Dev Sharma
CLL
ALM
26
2
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
52
10
0
14 Oct 2024
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant
  Human-Written Reasoning Chains
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Simeng Han
Aaron Yu
Rui Shen
Zhenting Qi
Martin Riddell
...
Yingbo Zhou
Caiming Xiong
Dragomir R. Radev
Rex Ying
Arman Cohan
LRM
43
2
0
11 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
43
8
0
11 Oct 2024
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization
Yuqing Nie
Chong Wang
K. Wang
Guoai Xu
Guosheng Xu
Haoyu Wang
OffRL
130
1
0
11 Oct 2024
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Rushang Karia
Daniel Bramblett
D. Dobhal
Siddharth Srivastava
ELM
LRM
30
0
0
11 Oct 2024
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+
  Interaction Trajectories
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Yifan Song
Weimin Xiong
Xiutian Zhao
Dawei Zhu
Wenhao Wu
Ke Wang
Cheng Li
Wei Peng
Sujian Li
LLMAG
28
9
0
10 Oct 2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann
Alexander Spiridonov
Robin Staab
Nikola Jovanović
Mark Vero
...
Mislav Balunović
Nikola Konstantinov
Pavol Bielik
Petar Tsankov
Martin Vechev
ELM
45
4
0
10 Oct 2024
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Zhipeng Chen
Liang Song
K. Zhou
Wayne Xin Zhao
B. Wang
Weipeng Chen
Ji-Rong Wen
63
0
0
10 Oct 2024
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Hyun Ryu
Gyeongman Kim
Hyemin S. Lee
Eunho Yang
LRM
40
3
0
10 Oct 2024
Do Current Language Models Support Code Intelligence for R Programming Language?
Do Current Language Models Support Code Intelligence for R Programming Language?
Zixiao Zhao
Fatemeh H. Fard
ELM
42
0
0
10 Oct 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
40
6
0
10 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
39
2
0
09 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
48
5
0
09 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
48
1
0
09 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Carlo Bosio
Mark W. Mueller
26
0
0
07 Oct 2024
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
Siqiao Xue
Tingting Chen
Fan Zhou
Qingyang Dai
Zhixuan Chu
Hongyuan Mei
36
4
0
06 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
32
17
0
05 Oct 2024
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
Ippei Fujisawa
Sensho Nobe
Hiroki Seto
Rina Onda
Yoshiaki Uchida
Hiroki Ikoma
Pei-Chun Chien
Ryota Kanai
LRM
42
3
0
04 Oct 2024
Residual Policy Learning for Perceptive Quadruped Control Using
  Differentiable Simulation
Residual Policy Learning for Perceptive Quadruped Control Using Differentiable Simulation
Jing Yuan Luo
Yunlong Song
Victor Klemm
Fan Shi
Davide Scaramuzza
Marco Hutter
33
1
0
04 Oct 2024
GraphRouter: A Graph-based Router for LLM Selections
GraphRouter: A Graph-based Router for LLM Selections
Tao Feng
Yanzhen Shen
Jiaxuan You
82
10
0
04 Oct 2024
Mixture of Attentions For Speculative Decoding
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning
Jiale Fu
Yaqing Wang
Simeng Han
Jiaming Fan
Chen Si
28
1
0
03 Oct 2024
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
SyDa
37
2
0
03 Oct 2024
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Xiangyu Peng
Congying Xia
Xinyi Yang
Caiming Xiong
Chien-Sheng Wu
Chen Xing
LRM
43
2
0
03 Oct 2024
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
Siru Ouyang
W. Yu
Kaixin Ma
Zilin Xiao
Z. Zhang
Mengzhao Jia
J. Han
H. Zhang
Dong Yu
49
12
0
03 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua-Hong Wu
144
1
0
03 Oct 2024
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter
  Merging
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging
Yiming Ju
Ziyi Ni
Xingrun Xing
Zhixiong Zeng
hanyu Zhao
Siqi Fan
Zheng Zhang
MoMe
29
2
0
01 Oct 2024
From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems
From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems
Ali Mohammadjafari
Anthony Maida
Raju N. Gottumukkala
AI4TS
36
5
0
01 Oct 2024
Semantic Parsing with Candidate Expressions for Knowledge Base Question Answering
Semantic Parsing with Candidate Expressions for Knowledge Base Question Answering
Daehwan Nam
Gary Geunbae Lee
38
0
0
01 Oct 2024
Federated Instruction Tuning of LLMs with Domain Coverage Augmentation
Federated Instruction Tuning of LLMs with Domain Coverage Augmentation
Zezhou Wang
Yaxin Du
Zhuzhong Qian
Yugang Jiang
Zhuzhong Qian
Siheng Chen
FedML
128
0
0
30 Sep 2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Yejin Lee
Anna Y. Sun
Basil Hosmer
Bilge Acun
Can Balioglu
...
Ram Pasunuru
Scott Yih
Sravya Popuri
Xing Liu
Carole-Jean Wu
52
2
0
30 Sep 2024
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Xin Sky Li
Weize Chen
Qizhi Chu
Haopeng Li
Zhaojun Sun
...
Yiwei Wei
Zhiyuan Liu
Chuan Shi
Maosong Sun
Cheng Yang
35
5
0
29 Sep 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Yotam Wolf
Binyamin Rothberg
Dorin Shteyman
Amnon Shashua
20
0
0
26 Sep 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
35
3
0
26 Sep 2024
Code Generation and Algorithmic Problem Solving Using Llama 3.1 405B
Code Generation and Algorithmic Problem Solving Using Llama 3.1 405B
Aniket Deroy
Subhankar Maity
33
4
0
26 Sep 2024
Previous
123...567...161718
Next