Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.07974
Cited By
v1
v2 (latest)
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
50 / 559 papers shown
The End of Manual Decoding: Towards Truly End-to-End Language Models
Z. Wang
Dongyang Ma
X. Y. Huang
Deng Cai
Tian Lan
J. Xu
Haitao Mi
Xiaoying Tang
Yan Wang
SyDa
OffRL
417
0
0
30 Oct 2025
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
Qiushi Sun
Jingyang Gong
Yang Liu
Qiaosheng Chen
Lei Li
Kai Chen
Qipeng Guo
B. Kao
Fei Yuan
137
1
0
27 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
109
0
0
27 Oct 2025
A Survey on LLM Mid-Training
Chengying Tu
Xuemiao Zhang
Rongxiang Weng
Rumei Li
Chen Zhang
Yang Bai
Hongfei Yan
Jingang Wang
Xunliang Cai
OffRL
LRM
239
2
0
27 Oct 2025
DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry
Changti Wu
Shijie Lian
Zihao Liu
Lei Zhang
Laurence Tianruo Yang
Kai Chen
AIMat
439
0
0
25 Oct 2025
Wisdom and Delusion of LLM Ensembles for Code Generation and Repair
Fernando Vallecillos Ruiz
Max Hort
Leon Moonen
162
1
0
24 Oct 2025
The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning
Raul Cavalcante Dinardi
Bruno Yamamoto
A. H. R. Costa
Artur Jordao
LRM
88
0
0
24 Oct 2025
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Nuo Chen
Zehua Li
Keqin Bao
Junyang Lin
Dayiheng Liu
LLMAG
LRM
118
0
0
24 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
140
0
0
22 Oct 2025
SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration
Xichen Zhang
Sitong Wu
Haoru Tan
Shaozuo Yu
Yinghao Zhu
Ziyi He
Jiaya Jia
LRM
139
0
0
22 Oct 2025
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Ling Team
Bin Han
Caizhi Tang
Chen Liang
Donghao Zhang
...
Yue Zhang
Yuchen Fang
Zibin Lin
Zixuan Cheng
Jun Zhou
LRM
220
1
0
22 Oct 2025
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
Ling Team
Anqi Shen
B. Li
Bin Hu
Bin Jing
...
Z. Pan
Longxiang Zhang
Zhenzhong Lan
Zhiqiang Ding
Zhiqiang Zhang
ALM
ReLM
LRM
263
4
0
21 Oct 2025
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
Xue Jiang
Yihong Dong
Mengyang Liu
Hongyi Deng
Tian Wang
...
Zhi Jin
Wenpin Jiao
Fei Huang
Yongbin Li
Ge Li
121
2
0
21 Oct 2025
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
Wenxuan Li
Chengruidong Zhang
Huiqiang Jiang
Yucheng Li
Yuqing Yang
Lili Qiu
140
0
0
21 Oct 2025
RESCUE: Retrieval Augmented Secure Code Generation
Jiahao Shi
Tianyi Zhang
SILM
220
0
0
21 Oct 2025
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
S. Bian
Tao Yu
Shivaram Venkataraman
Youngsuk Park
119
0
0
21 Oct 2025
TREAT: A Code LLMs Trustworthiness / Reliability Evaluation and Testing Framework
Shuzheng Gao
E. Li
Man Ho Lam
Jingyu Xiao
Yuxuan Wan
Chaozheng Wang
Ng Man Tik
Michael R. Lyu
148
0
0
20 Oct 2025
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning
He Du
B. Li
Aijun Yang
Siyang He
Qipeng Guo
Dacheng Tao
OffRL
157
0
0
20 Oct 2025
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Yihong Dong
Zhaoyu Ma
Xue Jiang
Zhiyuan Fan
Jiaru Qian
...
Rongyu Cao
B. Li
Fei Huang
Yongbin Li
Ge Li
125
4
0
20 Oct 2025
STARK: Strategic Team of Agents for Refining Kernels
Juncheng Dong
Yang Yang
Tao Liu
Y. Wang
Feng Qi
Vahid Tarokh
Kaushik Rangadurai
Shuang Yang
LLMAG
94
1
0
19 Oct 2025
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Yu Ying Chiu
Michael S. Lee
Rachel Calcott
Brandon Handoko
Paul de Font-Reaulx
...
Mantas Mazeika
Bing Liu
Yejin Choi
Mitchell L. Gordon
Sydney Levine
ELM
LRM
129
0
0
18 Oct 2025
Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning
Junlin Wu
Xianrui Zhong
Jiashuo Sun
Bolian Li
Bowen Jin
Jiawei Han
Qingkai Zeng
OffRL
AI4TS
LRM
111
0
0
16 Oct 2025
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
Mehrzad Samadi
Aleksander Ficek
Sean Narenthiran
Siddhartha Jain
Wasi Uddin Ahmad
Somshubra Majumdar
Vahid Noroozi
Boris Ginsburg
LRM
107
0
0
16 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMat
AI4TS
LRM
325
0
0
16 Oct 2025
Training LLM Agents to Empower Humans
Evan Ellis
Vivek Myers
Jens Tuyls
Sergey Levine
Anca Dragan
Benjamin Eysenbach
183
0
0
15 Oct 2025
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby
Ivan Lazarevich
Nish Sinnadurai
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
121
1
0
15 Oct 2025
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang
Weihang Su
Hongtao Tian
Tao Yang
Yujia Zhou
Ting Yao
Qingyao Ai
Yiqun Liu
LRM
103
0
0
13 Oct 2025
Information-Preserving Reformulation of Reasoning Traces for Antidistillation
Jiayu Ding
Lei Cui
Li Dong
Nanning Zheng
Furu Wei
LRM
120
0
0
13 Oct 2025
Demystifying Reinforcement Learning in Agentic Reasoning
Zhaochen Yu
Ling Yang
Jiaru Zou
Shuicheng Yan
Mengdi Wang
AI4TS
LRM
262
5
0
13 Oct 2025
Are Large Reasoning Models Interruptible?
Tsung-Han Wu
Mihran Miroyan
David M. Chan
Trevor Darrell
Narges Norouzi
Joseph E. Gonzalez
KELM
LRM
233
0
0
13 Oct 2025
Enhancing LLM Reasoning via Non-Human-Like Reasoning Path Preference Optimization
Junjie Lu
Yuliang Liu
Chaofeng Qu
Wei Shen
Zhouhan Lin
Min Xu
LRM
149
0
0
13 Oct 2025
Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning
Zexu Sun
Yongcheng Zeng
Erxue Min
Heyang Gao
Bokai Ji
Xu Chen
OffRL
ReLM
LRM
203
0
0
13 Oct 2025
DND: Boosting Large Language Models with Dynamic Nested Depth
Tieyuan Chen
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
W. Lin
Jianguo Li
MoE
230
0
0
13 Oct 2025
ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding
Xinbang Dai
Huikang Hu
Yongrui Chen
Jiaqi Li
Rihui Jin
Yuyang Zhang
Xiaoguang Li
Lifeng Shang
Guilin Qi
RALM
ELM
147
0
0
12 Oct 2025
MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Hongwei Chen
Yishu Lei
Dan Zhang
Bo Ke
Danxiang Zhu
...
Shikun Feng
Jingzhou He
Yu Sun
Hua Wu
Haifeng Wang
ReLM
LRM
132
0
0
11 Oct 2025
TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation
Yincen Qu
Huan Xiao
Feng Li
Gregory Li
Hui Zhou
Xiangying Dai
Xiaoru Dai
AI4TS
263
3
0
10 Oct 2025
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
Qiaosheng Chen
Y. Liu
Lei Li
Kai Chen
Q. Guo
Gong Cheng
Fei Yuan
ELM
153
1
0
10 Oct 2025
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
Y. Zhang
Muhammad Khalifa
Lechen Zhang
Xin Liu
Ayoung Lee
Xinliang Frederick Zhang
Farima Fatahi Bayat
L. Wang
RALM
LRM
102
4
0
10 Oct 2025
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Hyundong Jin
Joonghyuk Hahn
Yo-Sub Han
LRM
85
0
0
10 Oct 2025
LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?
Kaijian Zou
Aaron Xiong
Yunxiang Zhang
Frederick Zhang
Yueqi Ren
Jirong Yang
Ayoung Lee
Shitanshu Bhushan
Lu Wang
ReLM
ALM
ELM
LRM
476
1
0
10 Oct 2025
Do LLMs Really Need 10+ Thoughts for "Find the Time 1000 Days Later"? Towards Structural Understanding of LLM Overthinking
Xinliang Frederick Zhang
Anhad Mohananey
Alexandra Chronopoulou
Pinelopi Papalampidi
Somit Gupta
Tsendsuren Munkhdalai
Lu Wang
Shyam Upadhyay
LRM
175
0
0
09 Oct 2025
dInfer: An Efficient Inference Framework for Diffusion Language Models
Yuxin Ma
Lun Du
Lanning Wei
Kun Chen
Qian Xu
...
Jiaqi Hu
Zhenzhong Lan
Junbo Zhao
Jianguo Li
Da Zheng
MoE
AI4CE
214
10
0
09 Oct 2025
Learning What's Missing: Attention Dispersion and EMA Stabilization in Length Generalization
Pál Zsámboki
Benjamin Levi
David Ansel Josef Smith
Mitansh Kagalwala
Arlington Kell
Samuel Liechty
Cong Wang
111
0
0
09 Oct 2025
Multilingual Knowledge Graph Completion via Efficient Multilingual Knowledge Sharing
Cunli Mao
Xiaofei Gao
Ran Song
Shizhu He
Shengxiang Gao
Kang Liu
Zhengtao Yu
92
0
0
09 Oct 2025
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu
Jacob Dineen
Y. Huang
Sheng Zhang
Hoifung Poon
Ben Zhou
Muhao Chen
ELM
134
0
0
09 Oct 2025
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Xianzhen Luo
Jinyang Huang
Wenzhen Zheng
Qingfu Zhu
Mingzheng Xu
Yiheng Xu
YuanTao Fan
L. Qin
Wanxiang Che
96
3
0
09 Oct 2025
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices
Mallika Mainali
Harsha Sureshbabu
Anik Sen
Christopher B. Rauch
Noah Reifsnyder
John Meyer
J. T. Turner
Michael W. Floyd
M. Molineaux
Rosina O. Weber
94
0
0
07 Oct 2025
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding
Nikita Pavlichenko
Iurii Nazarov
Ivan Dolgov
Ekaterina Garanina
Dmitry Ustalov
...
Kirill Chekmenev
Joseph Shtok
Yaroslav Golubev
Anton Semenkin
Uladzislau Sazanovich
116
0
0
07 Oct 2025
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
Zhangyue Yin
Qiushi Sun
Zhiyuan Zeng
Zhiyuan Yu
Zengfeng Huang
Xuanjing Huang
Xipeng Qiu
LRM
109
0
0
07 Oct 2025
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Lingfei Zeng
Fengdi Che
Xuhan Huang
Fei Ye
X. Xu
Hang Zhao
Jie Fu
98
1
0
07 Oct 2025
Previous
1
2
3
4
5
...
10
11
12
Next