Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.07974
Cited By
v1
v2 (latest)
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
International Conference on Learning Representations (ICLR), 2024
12 March 2024
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
50 / 560 papers shown
Automated Research Article Classification and Recommendation Using NLP and ML
Shadikur Rahman
Hasibul Karim Shanto
Umme Ayman Koana
Syed Muhammad Danish
99
0
0
07 Oct 2025
MixReasoning: Switching Modes to Think
Haiquan Lu
Gongfan Fang
Xinyin Ma
Qi Li
Xinchao Wang
LRM
120
4
0
07 Oct 2025
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Lingfei Zeng
Fengdi Che
Xuhan Huang
Fei Ye
X. Xu
Hang Zhao
Jie Fu
103
1
0
07 Oct 2025
Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices
Mallika Mainali
Harsha Sureshbabu
Anik Sen
Christopher B. Rauch
Noah Reifsnyder
John Meyer
J. T. Turner
Michael W. Floyd
M. Molineaux
Rosina O. Weber
100
0
0
07 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
161
1
0
06 Oct 2025
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
Yicheng Tao
Yao Qin
Yepang Liu
3DV
181
5
0
06 Oct 2025
Toward a unified framework for data-efficient evaluation of large language models
Lele Liao
Qile Zhang
Ruofan Wu
Guanhua Fang
98
1
0
05 Oct 2025
PLSemanticsBench: Large Language Models As Programming Language Interpreters
Aditya Thimmaiah
Jiyang Zhang
Jayanth Srinivasa
Junyi Jessy Li
Miloš Gligorić
ReLM
LRM
213
0
0
03 Oct 2025
On the Role of Temperature Sampling in Test-Time Scaling
Yuheng Wu
Azalia Mirhoseini
Thierry Tambe
ALM
LRM
102
1
1
02 Oct 2025
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
Zhenyu Pan
Y. Zhang
Zhuo Liu
Y. Tang
Zeliang Zhang
...
Haoyang Fang
Manling Li
Chenliang Xu
Philip S. Yu
Han Liu
AAML
200
0
0
02 Oct 2025
InvThink: Towards AI Safety via Inverse Reasoning
Yubin Kim
Taehan Kim
Lizhou Fan
Chunjong Park
C. Breazeal
Daniel J. McDuff
Hae Won Park
AI4CE
ReLM
SILM
MU
LRM
282
1
0
02 Oct 2025
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
Tao Ren
Jinyang Jiang
Hui Yang
Wan Tian
Minhao Zou
...
Shentao Qin
Yanjun Zhao
Rui Tao
Hui Shao
Yijie Peng
124
1
0
01 Oct 2025
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Akshat Ramachandran
Marina Neseem
Charbel Sakr
Rangharajan Venkatesan
Brucek Khailany
Tushar Krishna
MQ
LRM
VLM
150
1
1
01 Oct 2025
RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning
Gang Li
Yulei Qin
Xiaoyu Tan
Dingkang Yang
Yuchen Shi
Zihan Xu
Xiang Li
Xing Sun
Ke Li
OffRL
ReLM
LRM
271
0
0
30 Sep 2025
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Xin Xu
Cliveb AI
Kai Yang
Tianhao Chen
Yang Wang
Saiyong Yang
Can Yang
OffRL
ReLM
LRM
305
2
0
30 Sep 2025
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Shenao Zhang
Donghan Yu
Yihao Feng
Bowen Jin
Zhaoran Wang
John Peebles
Zirui Wang
OffRL
ReLM
LRM
302
0
0
30 Sep 2025
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
H. Dai
Maoquan Wang
Mengnan Qi
Yikai Zhang
Zijian Jin
Yongqiang Yao
Yufan Huang
Shengyu Fu
Elsie Nallipogu
LLMAG
107
0
0
30 Sep 2025
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
S. Venkatraman
Vineet Jain
Sarthak Mittal
Vedant Shah
J. Obando-Ceron
...
B. Kailkhura
Guillaume Lajoie
Glen Berseth
Nikolay Malkin
Moksh Jain
ReLM
AIFin
LRM
218
3
0
30 Sep 2025
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
Yang Tang
Ruijie Liu
Yifan Wang
Shiyu Li
Xi Chen
114
0
0
30 Sep 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
Minhui Zhu
Minyang Tian
Xiaocheng Yang
Tianci Zhou
Lifan Yuan
...
Ruixing Zhang
X. Wang
Ofir Press
Nicolas Chia
Eliu A. Huerta
LRM
ELM
144
2
0
30 Sep 2025
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Zihao Zhu
Xinyu Wu
Gehan Hu
Siwei Lyu
Ke Xu
Baoyuan Wu
LRM
99
1
0
29 Sep 2025
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Guibin Zhang
Muxin Fu
Shuicheng Yan
LLMAG
398
9
0
29 Sep 2025
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
Yida Chen
Yuning Mao
Xianjun Yang
Suyu Ge
Shengjie Bi
Lijuan Liu
Saghar Hosseini
L Tan
Yixin Nie
Shaoliang Nie
LRM
165
0
0
29 Sep 2025
PIPer: On-Device Environment Setup via Online Reinforcement Learning
Alexander Kovrigin
Aleksandra V. Eliseeva
Konstantin Grotov
Egor Bogomolov
Yaroslav Zharov
OffRL
111
0
0
29 Sep 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang
Yue Ding
Jingwen Yang
Tianwei Luo
Dongbai Li
Ranjie Duan
Qiang Liu
Hang Su
Yinpeng Dong
Jun Zhu
LRM
140
1
0
29 Sep 2025
UI-UG: A Unified MLLM for UI Understanding and Generation
Hao Yang
Weijie Qiu
Ru Zhang
Zhou Fang
Ruichao Mao
...
Maji Huang
Longxiang Zhang
Teng Guo
Shuoyang Liu
Hai Rao
MLLM
194
1
0
29 Sep 2025
AutoCode: LLMs as Problem Setters for Competitive Programming
Shang Zhou
Zihan Zheng
Kaiyuan Liu
Zeyu Shen
Zerui Cheng
...
Peter Henderson
Natasha Jaques
Pramod Viswanath
Saining Xie
Jingbo Shang
99
1
0
29 Sep 2025
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
J. Liu
Sijun He
Jingjing Wu
X. Wang
Yang Chen
Zhaoqi Kuang
Siqi Bao
Yuan Yao
ELM
LRM
194
0
0
29 Sep 2025
ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation
Haonan Wang
Junfeng Sun
Xingdi Yuan
Ruoyao Wang
Ziang Xiao
95
0
0
28 Sep 2025
Evaluating Program Semantics Reasoning with Type Inference in System F
Yifeng He
Luning Yang
Christopher Castro Gaw Gonzalo
Hao Chen
ReLM
LRM
587
1
0
28 Sep 2025
HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs
K. Deng
Zizheng Zhan
Wen Xiang
Wenqiang Zhu
Tianhao Peng
...
Jie Liu
Zhaoxiang Zhang
Haotian Zhang
Bin Chen
Jiaheng Liu
LRM
164
2
0
28 Sep 2025
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Yixuan Han
Fan Ma
Ruijie Quan
Yi Yang
MoE
LRM
99
0
0
26 Sep 2025
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
Kanghoon Yoon
Minsub Kim
Sungjae Lee
Joonhyung Lee
Sunghyeon Woo
Yeonjun In
S. Kwon
Chanyoung Park
Dongsoo Lee
125
1
0
26 Sep 2025
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Yoonjeon Kim
Doohyuk Jang
Eunho Yang
ReLM
AIFin
LRM
206
1
0
26 Sep 2025
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Syeda Nahida Akter
Shrimai Prabhumoye
Eric Nyberg
M. Patwary
Mohammad Shoeybi
Yejin Choi
Bryan Catanzaro
AIFin
LRM
AI4CE
120
6
0
26 Sep 2025
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRL
LRM
213
0
0
26 Sep 2025
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
Aaron Tu
Weihao Xuan
Heli Qi
X. Y. Huang
Qingcheng Zeng
...
Amin Saberi
Naoto Yokoya
Jure Leskovec
Yejin Choi
Fang Wu
OffRL
161
3
0
26 Sep 2025
Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
Tyler Loakman
William Thorne
Chenghua Lin
LRM
141
2
0
25 Sep 2025
Verification Limits Code LLM Training
Srishti Gureja
Elena Tommasone
Jingyi He
Sara Hooker
Matthias Gallé
Marzieh Fadaee
ALM
OffRL
129
1
0
25 Sep 2025
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning
Xueliang Zhao
Wei Wu
Jian Guan
Zhuocheng Gong
Lingpeng Kong
ReLM
OffRL
LRM
AI4TS
176
1
0
24 Sep 2025
Thinking Augmented Pre-training
Liang Wang
Nan Yang
Shaohan Huang
Li Dong
Furu Wei
LRM
322
2
0
24 Sep 2025
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
Pei-Shuo Wang
Jian-Jia Chen
Chun-Che Yang
Chi-chih Chang
N. Huang
Mohamed S. Abdelfattah
Kai-Chiang Wu
MQ
216
0
0
22 Sep 2025
MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM
W. Lee
Junhee Cho
Jungwook Choi
LLMAG
ALM
97
0
0
22 Sep 2025
FlowRL: Matching Reward Distributions for LLM Reasoning
Xuekai Zhu
Daixuan Cheng
D. Zhang
Hengli Li
Kaiyan Zhang
...
J. Gao
Xiaodong Liu
Bowen Zhou
Hongyuan Mei
Zhouhan Lin
LRM
259
7
0
18 Sep 2025
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Qikai Chang
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Yicheng Pan
Jianshu Zhang
Jun Du
Quan Liu
J. Gao
OffRL
LRM
159
3
0
17 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
171
1
0
16 Sep 2025
SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems
Xifeng Yao
Dongyu Lang
Wu Zhang
Xintong Guo
Huarui Xie
...
Ping Liu
Guang Shen
Yi Bai
Dandan Tu
Changzheng Zhang
108
0
0
16 Sep 2025
Do Code Semantics Help? A Comprehensive Study on Execution Trace-Based Information for Code Large Language Models
Jian-Xun Wang
Xiaofei Xie
Q. Hu
Shangqing Liu
Yi Li
LRM
189
2
0
15 Sep 2025
Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction
Yijun Liu
Yixuan Wang
Yuzhuang Xu
Shiyu Ji
Yang Xu
Qingfu Zhu
Wanxiang Che
153
0
0
13 Sep 2025
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Akshit Sinha
Arvindh Arun
Shashwat Goel
Steffen Staab
Jonas Geiping
ALM
LRM
301
9
0
11 Sep 2025
Previous
1
2
3
4
5
6
...
10
11
12
Next
Page 3 of 12
Page
of 12
Go