Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03091
Cited By
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
5 June 2023
Tianyang Liu
Canwen Xu
Julian McAuley
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems"
50 / 108 papers shown
Title
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Kai Xu
YiWei Mao
XinYi Guan
ZiLong Feng
21
0
0
12 May 2025
YABLoCo: Yet Another Benchmark for Long Context Code Generation
Aidar Valeev
Roman Garaev
Vadim Lomshakov
Irina Piontkovskaya
Vladimir Ivanov
Israel Adewuyi
38
0
0
07 May 2025
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Y. Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
H. Li
LLMAG
41
0
0
06 May 2025
SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories
Connor Dilgren
Purva Chiniya
Luke Griffith
Yu Ding
Yizheng Chen
38
0
0
29 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
S. Hwang
AI4CE
35
0
0
24 Apr 2025
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
Anirudh Khatry
Robert Zhang
Jia Pan
Ziteng Wang
Qiaochu Chen
Greg Durrett
Isil Dillig
32
0
0
21 Apr 2025
RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation
Peiyang Wu
Nan Guo
Junliang Lv
Xiao Xiao
Xiaochun Ye
29
1
0
11 Apr 2025
Safe Screening Rules for Group OWL Models
Runxue Bao
Quanchao Lu
Yanfu Zhang
34
0
0
04 Apr 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
L. Zhang
...
Jing Su
Tianyu Liu
Rui Long
Kai Shen
Liang Xiang
36
1
0
03 Apr 2025
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei
Tarun Suresh
Jiannan Cao
Naveen Kannan
Yuheng Wu
Kai Yan
Thiago S. F. X. Teixeira
Ke Wang
Alex Aiken
ELM
LRM
34
0
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
47
0
0
28 Mar 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
59
2
0
18 Mar 2025
Landscape Complexity for the Empirical Risk of Generalized Linear Models: Discrimination between Structured Data
Theodoros G. Tsironis
Aris L. Moustakas
52
1
0
18 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
64
0
0
17 Mar 2025
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
Nicholas Roberts
Niladri S. Chatterji
Sharan Narang
Mike Lewis
Dieuwke Hupkes
46
2
0
13 Mar 2025
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Dhruv Gautam
Spandan Garg
Jinu Jang
Neel Sundaresan
Roshanak Zilouchian Moghaddam
LLMAG
LRM
62
2
0
10 Mar 2025
DependEval: Benchmarking LLMs for Repository Dependency Understanding
Junjia Du
Yadi Liu
Hongcheng Guo
Jiawei Wang
Haojian Huang
Yunyi Ni
Z. Li
46
1
0
09 Mar 2025
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li
Xin Zhang
Zhongxin Guo
Shaoguang Mao
Wen Luo
Guangyue Peng
Yangyu Huang
Houfeng Wang
Scarlett Li
53
0
0
09 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPC
AI4CE
55
1
0
06 Mar 2025
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair
Zaoyu Chen
Haoran Qin
Nuo Chen
Xiangyu Zhao
Lei Xue
Xiapu Luo
Xiao-Ming Wu
41
0
0
03 Mar 2025
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
K. Yan
Hongcheng Guo
Xuanqing Shi
J. Xu
Yaonan Gu
Z. Li
ALM
87
0
0
26 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
81
3
0
26 Feb 2025
CodeSwift: Accelerating LLM Inference for Efficient Code Generation
Qianhui Zhao
L. Zhang
Fang Liu
Xiaoli Lian
Qiaoyuanhe Meng
Ziqian Jiao
Zetong Zhou
Borui Zhang
Runlin Guo
Jia Li
41
0
0
24 Feb 2025
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Penghui Yang
Cunxiao Du
Fengzhuo Zhang
Haonan Wang
Tianyu Pang
Chao Du
Bo An
RALM
45
0
0
24 Feb 2025
Code Summarization Beyond Function Level
Vladimir Makharev
Vladimir Ivanov
31
0
0
23 Feb 2025
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Bohan Lyu
Siqiao Huang
Zichen Liang
Qi-An Sun
Jiaming Zhang
ELM
LRM
47
0
0
16 Feb 2025
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
Yijia Xiao
Runhui Wang
Luyang Kong
Davor Golac
Wei Wang
LLMAG
61
0
0
10 Feb 2025
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
WeiZhi Fei
Xueyan Niu
Guoqing Xie
Yingqing Liu
Bo Bai
Wei Han
28
1
0
22 Jan 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
S. Cheung
ALM
69
2
0
18 Jan 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
36
2
0
03 Jan 2025
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System
Hyucksung Kwon
Kyungmo Koo
Janghyeon Kim
W. Lee
Minjae Lee
...
Yongkee Kwon
Ilkon Kim
Euicheol Lim
John Kim
Jungwook Choi
51
4
0
28 Dec 2024
Modality-Projection Universal Model for Comprehensive Full-Body Medical Imaging Segmentation
Yixin Chen
Lin Gao
Yajuan Gao
Rui Wang
Jingge Lian
...
Y. Duan
Leiying Chai
Hongbin Han
Zhaoping Cheng
Zhaoheng Xie
32
4
0
26 Dec 2024
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
Y. Liu
Rui Meng
Shafiq R. Joty
Silvio Savarese
Caiming Xiong
Yingbo Zhou
Semih Yavuz
90
3
0
19 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
39
2
0
15 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
68
16
0
07 Nov 2024
Crystal: Illuminating LLM Abilities on Language and Code
Tianhua Tao
Junbo Li
Bowen Tan
Hongyi Wang
William Marshall
...
Joel Hestness
Natalia Vassilieva
Zhiqiang Shen
Eric P. Xing
Zhengzhong Liu
40
4
0
06 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
36
1
0
05 Nov 2024
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Krishna Teja Chitty-Venkata
Siddhisanket Raskar
B. Kale
Farah Ferdaus
Aditya Tanikanti
Ken Raffenetti
Valerie Taylor
M. Emani
V. Vishwanath
39
4
0
31 Oct 2024
Can Language Models Replace Programmers? REPOCOD Says Ñot Yet'
Shanchao Liang
Yiran Hu
Nan Jiang
Lin Tan
ALM
ELM
24
2
0
29 Oct 2024
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
J. Liu
Ken Deng
Congnan Liu
Jian Yang
Shukai Liu
...
Zekun Wang
Guoan Zhang
Bangyu Xiang
Wenbo Su
Bo Zheng
58
4
0
28 Oct 2024
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
Nan Xu
Xuezhe Ma
LRM
29
3
0
18 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
57
32
0
14 Oct 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
24
17
0
04 Oct 2024
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?
Zhenyu Pan
Rongyu Cao
Yongchang Cao
Yingwei Ma
Binhua Li
Fei Huang
Han Liu
Yongbin Li
37
4
0
02 Oct 2024
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Ben Bogin
Kejuan Yang
Shashank Gupta
Kyle Richardson
Erin Bransom
Peter Clark
Ashish Sabharwal
Tushar Khot
ELM
LRM
40
9
0
11 Sep 2024
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks
Zi Yang
28
0
0
10 Sep 2024
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Huy N. Phan
Phong X. Nguyen
Nghi D. Q. Bui
LLMAG
28
10
0
09 Sep 2024
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning
Keer Lu
Xiaonan Nie
Zheng Liang
Da Pan
Shusen Zhang
...
Weipeng Chen
Zenan Zhou
Guosheng Dong
Bin Cui
Wentao Zhang
27
0
0
02 Sep 2024
Statically Contextualizing Large Language Models with Typed Holes
Andrew Blinn
Xiang Li
June Hyung Kim
Cyrus Omar
25
1
0
02 Sep 2024
1
2
3
Next