Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.14662
Cited By
NExT: Teaching Large Language Models to Reason about Code Execution
23 April 2024
Ansong Ni
Miltiadis Allamanis
Arman Cohan
Yinlin Deng
Kensen Shi
Charles Sutton
Pengcheng Yin
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NExT: Teaching Large Language Models to Reason about Code Execution"
19 / 19 papers shown
Title
CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Man Ho Adrian Lam
Chaozheng Wang
Jen-tse Huang
M. Lyu
LRM
34
0
0
19 Apr 2025
The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs
Haonan Li
Hang Zhang
Kexin Pei
Zhiyun Qian
53
1
0
16 Apr 2025
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Rem Yang
Julian Dai
N. Vasilakis
Martin Rinard
ELM
LRM
27
0
0
07 Apr 2025
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei
Tarun Suresh
Jiannan Cao
Naveen Kannan
Yuheng Wu
Kai Yan
Thiago S. F. X. Teixeira
Ke Wang
Alex Aiken
ELM
LRM
41
0
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
80
0
0
28 Mar 2025
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors
Zhiyu Yang
Shuo Wang
Yukun Yan
Yang Deng
24
0
0
28 Mar 2025
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language
Dimitrios Stamatios Bouras
Yihan Dai
Tairan Wang
Yingfei Xiong
Sergey Mechtaev
LRM
46
0
0
25 Mar 2025
The KoLMogorov Test: Compression by Code Generation
Ori Yoran
Kunhao Zheng
Fabian Gloeckle
Jonas Gehring
Gabriel Synnaeve
Taco Cohen
62
1
0
18 Mar 2025
Investigating Execution-Aware Language Models for Code Optimization
Federico Di Menna
Luca Traini
Gabriele Bavota
Vittorio Cortellessa
61
0
0
11 Mar 2025
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
LRM
62
0
0
10 Mar 2025
Multi-Turn Code Generation Through Single-Step Rewards
A. Jain
Gonzalo Gonzalez-Pumariega
Wayne Chen
Alexander M. Rush
Wenting Zhao
Sanjiban Choudhury
LRM
47
1
0
27 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
81
3
0
26 Feb 2025
EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking
Anjiang Wei
Jiannan Cao
Ran Li
H. Chen
Y. Zhang
...
Yuan Liu
Thiago S. F. X. Teixeira
D. Yang
Ke Wang
Alex Aiken
LRM
47
1
0
18 Feb 2025
Learning to Generate Unit Tests for Automated Debugging
Archiki Prasad
Elias Stengel-Eskin
Justin Chih-Yao Chen
Zaid Khan
Mohit Bansal
ELM
76
1
0
03 Feb 2025
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement
Yingwei Ma
Rongyu Cao
Yongchang Cao
Y. Zhang
J. Chen
Yibo Liu
Yuchen Liu
Binhua Li
Fei Huang
Yongbin Li
49
5
0
01 Nov 2024
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta
Luca Weihs
Aniruddha Kembhavi
LLMAG
ELM
56
1
0
18 Jun 2024
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
178
780
0
02 May 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
1