ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.15877
  4. Cited By
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

22 June 2024
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
Ratnadira Widyasari
Imam Nur Bani Yusuf
Haolan Zhan
Junda He
Indraneil Paul
Simon Brunner
Chen Gong
Thong Hoang
A. Zebaze
Xiaoheng Hong
Wen-Ding Li
Jean Kaddour
Ming Xu
Zhihan Zhang
Prateek Yadav
Naman Jain
Alex Gu
Zhoujun Cheng
Jiawei Liu
Qian Liu
Zijian Wang
Binyuan Hui
Binyuan Hui
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
ArXivPDFHTML

Papers citing "BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions"

50 / 102 papers shown
Title
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
70
14
0
17 Feb 2025
LLM Agents Making Agent Tools
LLM Agents Making Agent Tools
Georg Wolflein
Dyke Ferber
Daniel Truhn
Ognjen Arandjelovic
Jakob Nikolas Kather
LLMAG
41
4
0
17 Feb 2025
The Philosophical Foundations of Growing AI Like A Child
The Philosophical Foundations of Growing AI Like A Child
Dezhi Luo
Yijiang Li
Hokin Deng
ReLM
LRM
34
1
0
15 Feb 2025
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
C. Zhou
Xinyu Zhang
Dandan Song
Xiancai Chen
Wanli Gu
Huipeng Ma
Yuhang Tian
M. Zhang
Linmei Hu
55
1
0
13 Feb 2025
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
Xin Zhou
M. Weyssow
Ratnadira Widyasari
Ting Zhang
Junda He
Yunbo Lyu
Jianming Chang
Beiqi Zhang
Dan Huang
David Lo
PILM
121
0
0
10 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LLMAG
LRM
57
5
0
04 Feb 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
S. Cheung
ALM
56
1
0
18 Jan 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
25
2
0
03 Jan 2025
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Huawen Feng
Pu Zhao
Qingfeng Sun
Can Xu
Fangkai Yang
...
Qianli Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
AAML
ALM
55
0
0
23 Dec 2024
Does Few-Shot Learning Help LLM Performance in Code Synthesis?
Does Few-Shot Learning Help LLM Performance in Code Synthesis?
Derek Xu
Tong Xie
Botao Xia
Haoyu Li
Yunsheng Bai
Yizhou Sun
Wei Wang
66
0
0
03 Dec 2024
Noise Injection Reveals Hidden Capabilities of Sandbagging Language
  Models
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice
Philipp Alexander Kreer
Nathan Helm-Burger
Prithviraj Singh Shahani
Fedor Ryzhenkov
Jacob Haimes
Felix Hofstätter
Teun van der Weij
61
1
0
02 Dec 2024
TruncFormer: Private LLM Inference Using Only Truncations
TruncFormer: Private LLM Inference Using Only Truncations
Patrick Yubeaton
Jianqiao Mo
Karthik Garimella
N. Jha
Brandon Reagen
Chinmay Hegde
Siddharth Garg
66
0
0
02 Dec 2024
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code
  to Improve Code LMs
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Zhihan Liu
Shenao Zhang
Yongfei Liu
Boyi Liu
Yingxiang Yang
Zhaoran Wang
105
2
0
20 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
48
16
0
07 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code
  Generation Models
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
21
1
0
05 Nov 2024
An LLM Agent for Automatic Geospatial Data Analysis
An LLM Agent for Automatic Geospatial Data Analysis
Yuxing Chen
Weijie Wang
Sylvain Lobry
Camille Kurtz
LLMAG
20
3
0
24 Oct 2024
MorphAgent: Empowering Agents through Self-Evolving Profiles and
  Decentralized Collaboration
MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
Siyuan Lu
Jiaqi Shao
B. Luo
Tao Lin
LM&Ro
LLMAG
AI4CE
11
1
0
19 Oct 2024
Agent Skill Acquisition for Large Language Models via CycleQD
Agent Skill Acquisition for Large Language Models via CycleQD
So Kuroki
Taishi Nakamura
Takuya Akiba
Yujin Tang
MoMe
19
0
0
16 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
42
32
0
14 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
26
0
0
09 Oct 2024
Large Language Models as Code Executors: An Exploratory Study
Large Language Models as Code Executors: An Exploratory Study
Chenyang Lyu
Lecheng Yan
Rui Xing
Wenxi Li
Younes Samih
Tianbo Ji
Longyue Wang
ELM
ALM
LRM
21
2
0
09 Oct 2024
An evaluation of LLM code generation capabilities through graded
  exercises
An evaluation of LLM code generation capabilities through graded exercises
Álvaro Barbero Jiménez
ELM
15
0
0
06 Oct 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
  Domains?
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
11
17
0
04 Oct 2024
Learning Code Preference via Synthetic Evolution
Learning Code Preference via Synthetic Evolution
Jiawei Liu
Thanh Nguyen
Mingyue Shang
Hantian Ding
Xiaopeng Li
Yu Yu
Varun Kumar
Zijian Wang
SyDa
ALM
AAML
18
3
0
04 Oct 2024
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM
  Benchmark Scores
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Robert E Blackwell
Jon Barry
Anthony G Cohn
UQCV
18
0
0
04 Oct 2024
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
Ippei Fujisawa
Sensho Nobe
Hiroki Seto
Rina Onda
Yoshiaki Uchida
Hiroki Ikoma
Pei-Chun Chien
Ryota Kanai
LRM
26
1
0
04 Oct 2024
Approximately Aligned Decoding
Approximately Aligned Decoding
Daniel Melcer
Sujan Kumar Gonugondla
Pramuditha Perera
Haifeng Qian
Wen-Hao Chiang
Yanjun Wang
Nihal Jain
Pranav Garg
Xiaofei Ma
Anoop Deoras
26
0
0
01 Oct 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Yotam Wolf
Binyamin Rothberg
Dorin Shteyman
Amnon Shashua
13
0
0
26 Sep 2024
A Comprehensive Framework for Evaluating API-oriented Code Generation in
  Large Language Models
A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
Yixi Wu
Pengfei He
Zehao Wang
Shaowei Wang
Yuan Tian
Tse-Hsun Chen
ALM
22
0
0
23 Sep 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
57
195
0
18 Sep 2024
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs
  Fine-tuning
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning
Essa Jan
Nouar Aldahoul
Moiz Ali
Faizan Ahmad
Fareed Zaffar
Yasir Zaki
13
1
0
18 Sep 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min-Bin Lin
Ngai Wong
PILM
40
40
0
18 Jul 2024
What's Wrong with Your Code Generated by Large Language Models? An
  Extensive Study
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Shihan Dou
Haoxiang Jia
Shenxi Wu
Huiyuan Zheng
Weikang Zhou
...
Xunliang Cai
Tao Gui
Xipeng Qiu
Qi Zhang
Xuanjing Huang
14
22
0
08 Jul 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model
  Merging
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Yun-Nung Chen
VLM
ALM
18
1
0
01 Jul 2024
APIGen: Automated Pipeline for Generating Verifiable and Diverse
  Function-Calling Datasets
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
Zuxin Liu
Thai Hoang
Jianguo Zhang
Ming Zhu
Tian Lan
...
Silvio Savarese
Juan Carlos Niebles
Huan Wang
Shelby Heinecke
Caiming Xiong
32
32
0
26 Jun 2024
Dissecting Adversarial Robustness of Multimodal LM Agents
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Henry Wu
Jing Yu Koh
Ruslan Salakhutdinov
Ruslan Salakhutdinov
Aditi Raghunathan
Aditi Raghunathan
AAML
VLM
27
26
0
18 Jun 2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and
  BenchBuilder Pipeline
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Tianle Li
Wei-Lin Chiang
Evan Frick
Lisa Dunlap
Tianhao Wu
Banghua Zhu
Joseph E. Gonzalez
Ion Stoica
ALM
22
101
0
17 Jun 2024
VersiCode: Towards Version-controllable Code Generation
VersiCode: Towards Version-controllable Code Generation
Tongtong Wu
Weigang Wu
Xingyu Wang
Kang Xu
Suyu Ma
Bo Jiang
Ping Yang
Zhenchang Xing
Yuan-Fang Li
Gholamreza Haffari
16
4
0
11 Jun 2024
MHPP: Exploring the Capabilities and Limitations of Language Models
  Beyond Basic Code Generation
MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation
Jianbo Dai
Jianqiao Lu
Yunlong Feng
Rongju Ruan
Ming Cheng
Haochen Tan
Zhijiang Guo
ELM
LRM
23
11
0
19 May 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software
  Engineering
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang
Carlos E. Jimenez
Alexander Wettig
K. Lieret
Shunyu Yao
Karthik Narasimhan
Ofir Press
LLMAG
91
36
0
06 May 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
40
8
0
18 Mar 2024
Repetition Improves Language Model Embeddings
Repetition Improves Language Model Embeddings
Jacob Mitchell Springer
Suhas Kotha
Daniel Fried
Graham Neubig
Aditi Raghunathan
34
9
0
23 Feb 2024
Can LLMs Patch Security Issues?
Can LLMs Patch Security Issues?
Kamel Alrashedy
Abdullah Aljasser
Pradyumna Tambwekar
Matthew Gombolay
AAML
6
5
0
13 Nov 2023
Data Augmentation for Code Translation with Comparable Corpora and
  Multiple References
Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Yiqing Xie
Atharva Naik
Daniel Fried
Carolyn Rose
26
4
0
01 Nov 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
158
388
0
02 May 2023
When Language Model Meets Private Library
When Language Model Meets Private Library
Daoguang Zan
Bei Chen
Zeqi Lin
Bei Guan
Yongji Wang
Jian-Guang Lou
ALM
61
68
0
31 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language
  Augmentation
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
147
86
0
06 Dec 2021
Reference-Centric Models for Grounded Collaborative Dialogue
Reference-Centric Models for Grounded Collaborative Dialogue
Daniel Fried
Justin T. Chiu
Dan Klein
21
19
0
10 Sep 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
183
614
0
20 May 2021
Previous
123
Next