ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.15877
  4. Cited By
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

22 June 2024
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
Ratnadira Widyasari
Imam Nur Bani Yusuf
Haolan Zhan
Junda He
Indraneil Paul
Simon Brunner
Chen Gong
Thong Hoang
A. Zebaze
Xiaoheng Hong
Wen-Ding Li
Jean Kaddour
Ming Xu
Zhihan Zhang
Prateek Yadav
Naman Jain
Alex Gu
Zhoujun Cheng
Jiawei Liu
Qian Liu
Zijian Wang
Binyuan Hui
Binyuan Hui
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
ArXivPDFHTML

Papers citing "BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions"

50 / 102 papers shown
Title
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Y. Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
H. Li
LLMAG
25
0
0
06 May 2025
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
35
0
0
05 May 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
Y. Li
Qizhi Pei
Mengyuan Sun
Honglin Lin
Chenlin Ming
Xin Gao
Jiang Wu
C. He
Lijun Wu
ELM
LRM
35
0
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
Dynamic Early Exit in Reasoning Models
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng-Shen Lin
Li Cao
Weiping Wang
ReLM
LRM
19
0
0
22 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq R. Joty
ELM
ALM
LRM
33
1
0
21 Apr 2025
FlowReasoner: Reinforcing Query-Level Meta-Agents
FlowReasoner: Reinforcing Query-Level Meta-Agents
Hongcheng Gao
Yue Liu
Yufei He
Longxu Dou
C. Du
Zhijie Deng
Bryan Hooi
Min Lin
Tianyu Pang
AIFin
LRM
17
1
0
21 Apr 2025
RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents
RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents
Sid Black
Asa Cooper Stickland
Jake Pencharz
Oliver Sourbut
Michael Schmatz
Jay Bailey
Ollie Matthews
Ben Millwood
Alex Remedios
Alan Cooney
ELM
49
0
0
21 Apr 2025
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
Minh V.T. Pham
Huy N. Phan
Hoang N. Phan
Cuong Le Chi
T. Nguyen
Nghi D. Q. Bui
SyDa
17
0
0
20 Apr 2025
CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Man Ho Adrian Lam
Chaozheng Wang
Jen-tse Huang
M. Lyu
LRM
24
0
0
19 Apr 2025
Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation
Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation
Mingwei Liu
Juntao Li
Ying Wang
Xueying Du
Zuoyu Ou
...
Zhao Wei
Y. Xu
Fangming Zou
Xin Peng
Yiling Lou
30
0
0
17 Apr 2025
Teaching Large Language Models to Reason through Learning and Forgetting
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
26
0
0
15 Apr 2025
Towards an Understanding of Context Utilization in Code Intelligence
Towards an Understanding of Context Utilization in Code Intelligence
Yanlin Wang
Kefeng Duan
Dewu Zheng
Ensheng Shi
F. Zhang
...
Xilin Liu
Yuchi Ma
Hongyu Zhang
Qianxiang Wang
Zibin Zheng
19
0
0
11 Apr 2025
R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation
R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation
M. Weyssow
Chengran Yang
Junkai Chen
Yikun Li
Huihui Huang
...
Han Wei Ang
Frank Liauw
Eng Lieh Ouh
Lwin Khin Shar
David Lo
LRM
28
0
0
07 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
36
9
0
03 Apr 2025
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Wasi Uddin Ahmad
Sean Narenthiran
Somshubra Majumdar
Aleksander Ficek
Siddhartha Jain
Jocelyn Huang
Vahid Noroozi
Boris Ginsburg
LRM
42
2
0
02 Apr 2025
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei
Tarun Suresh
Jiannan Cao
Naveen Kannan
Yuheng Wu
Kai Yan
Thiago S. F. X. Teixeira
Ke Wang
Alex Aiken
ELM
LRM
29
0
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
32
0
0
28 Mar 2025
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
Indraneil Paul
Haoyi Yang
Goran Glavas
Kristian Kersting
Iryna Gurevych
AAML
SyDa
34
0
0
27 Mar 2025
Verbal Process Supervision Elicits Better Coding Agents
Verbal Process Supervision Elicits Better Coding Agents
Hao-Yuan Chen
Cheng-Pong Huang
Jui-Ming Yao
ELM
LRM
41
1
0
24 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
C. Chen
...
Wei Zhang
Z. Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALM
MoE
41
0
0
22 Mar 2025
RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
Linxi Liang
Jing Gong
Mingwei Liu
Chong Wang
Guangsheng Ou
Yanlin Wang
Xin Peng
Zibin Zheng
ALM
54
0
0
21 Mar 2025
LLMs Love Python: A Study of LLMs' Bias for Programming Languages and Libraries
LLMs Love Python: A Study of LLMs' Bias for Programming Languages and Libraries
Lukas Twist
Jie M. Zhang
Mark Harman
Don Syme
Joost Noppen
Detlef Nauck
34
0
0
21 Mar 2025
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models
Hong Yi Lin
Chunhua Liu
Haoyu Gao
Patanamon Thongtanunam
Christoph Treude
ELM
38
0
0
20 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
31
0
0
19 Mar 2025
A Comprehensive Study of LLM Secure Code Generation
A Comprehensive Study of LLM Secure Code Generation
Shih-Chieh Dai
Jun Xu
Guanhong Tao
ELM
35
0
0
18 Mar 2025
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings
Daniil Orel
Dilshod Azizov
Preslav Nakov
DeLMO
45
0
0
17 Mar 2025
ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning
Xinyi Wang
Jiashui Wang
Peng Chen
Jinbo Su
Yanming Liu
Long Liu
Yangdong Wang
Qiyuan Chen
Kai Yun
Chunfu Jia
40
0
0
14 Mar 2025
Large language model-powered AI systems achieve self-replication with no human intervention
Large language model-powered AI systems achieve self-replication with no human intervention
Xudong Pan
Jiarun Dai
Yihe Fan
Minyuan Luo
Changyi Li
Min Yang
GNN
LRM
44
0
0
14 Mar 2025
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
Nicholas Roberts
Niladri S. Chatterji
Sharan Narang
Mike Lewis
Dieuwke Hupkes
38
2
0
13 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
40
0
0
07 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPC
AI4CE
51
0
0
06 Mar 2025
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions
Julian Aron Prenner
Romain Robbes
54
0
0
06 Mar 2025
Trim My View: An LLM-Based Code Query System for Module Retrieval in Robotic Firmware
Sima Arasteh
Pegah Jandaghi
Nicolaas Weideman
Dennis Perepech
Mukund Raghothaman
Christophe Hauser
Luis Garcia
52
0
0
05 Mar 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALM
OffRL
68
5
0
04 Mar 2025
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging
Jie Wu
Haoling Li
Xin Zhang
Jianwen Luo
Yangyu Huang
Ruihang Chu
Y. Yang
Scarlett Li
62
0
0
04 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
55
18
0
03 Mar 2025
Evaluating Polish linguistic and cultural competency in large language models
Sławomir Dadas
Małgorzata Grębowiec
Michał Perełkiewicz
Rafał Poświata
ELM
26
1
0
02 Mar 2025
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology
Ludovico Mitchener
Jon M. Laurent
Benjamin Tenmann
Siddharth Narayanan
Geemi P Wellawatte
A. White
Lorenzo Sani
Samuel G. Rodriques
LLMAG
LM&MA
ELM
51
2
0
28 Feb 2025
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han
Seung-won Hwang
Rajhans Samdani
Yuxiong He
ALM
55
2
0
27 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
J. Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Z. Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
57
6
0
26 Feb 2025
Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Deep-Bench: Deep Learning Benchmark Dataset for Code Generation
Alireza Daghighfarsoodeh
Chung-Yu Wang
Hamed Taherkhani
Melika Sepidband
Mohammad Abdollahi
Hadi Hemmati
Hung Viet Pham
ALM
ELM
83
0
0
26 Feb 2025
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
K. Yan
Hongcheng Guo
Xuanqing Shi
J. Xu
Yaonan Gu
Z. Li
ALM
74
0
0
26 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
73
3
0
26 Feb 2025
StatLLM: A Dataset for Evaluating the Performance of Large Language Models in Statistical Analysis
StatLLM: A Dataset for Evaluating the Performance of Large Language Models in Statistical Analysis
Xinyi Song
Lina Lee
Kexin Xie
Xueying Liu
Xinwei Deng
Yili Hong
ALM
ELM
40
0
0
24 Feb 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
73
3
0
24 Feb 2025
DataSciBench: An LLM Agent Benchmark for Data Science
DataSciBench: An LLM Agent Benchmark for Data Science
Dan Zhang
Sining Zhoubian
Min Cai
Fengzu Li
L. Yang
Wei Wang
Tianjiao Dong
Ziniu Hu
J. Tang
Yisong Yue
ALM
ELM
32
2
0
20 Feb 2025
Pragmatic Reasoning improves LLM Code Generation
Pragmatic Reasoning improves LLM Code Generation
Zhuchen Cao
Sven Apel
Adish Singla
Vera Demberg
LRM
34
0
0
20 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
B. Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Z. Wu
LM&MA
ELM
AI4MH
29
3
0
18 Feb 2025
LLM Agents Making Agent Tools
LLM Agents Making Agent Tools
Georg Wolflein
Dyke Ferber
Daniel Truhn
Ognjen Arandjelovic
Jakob Nikolas Kather
LLMAG
39
4
0
17 Feb 2025
123
Next