ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11501
  4. Cited By
DS-1000: A Natural and Reliable Benchmark for Data Science Code
  Generation

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

18 November 2022
Yuhang Lai
Chengxi Li
Yiming Wang
Tianyi Zhang
Ruiqi Zhong
Luke Zettlemoyer
Scott Yih
Daniel Fried
Si-yi Wang
Tao Yu
    ELM
    ALM
ArXivPDFHTML

Papers citing "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation"

48 / 48 papers shown
Title
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
Sizhe Wang
Z. Wang
Dongsheng Ma
Yongan Yu
Rui Ling
Z. Li
Feiyu Xiong
W. Zhang
LRM
47
0
0
30 Apr 2025
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
Yunseo Lee
John Youngeun Song
Dongsun Kim
Jindae Kim
Mijung Kim
Jaechang Nam
HILM
LRM
33
0
0
29 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
40
0
0
07 Apr 2025
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness
Ce Guo
Tong Zhao
51
1
0
11 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPC
AI4CE
53
0
0
06 Mar 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
75
3
0
24 Feb 2025
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
Qiuhai Zeng
Claire Jin
Xinyue Wang
Yuhan Zheng
Qunhua Li
36
0
0
23 Feb 2025
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
Ruizhong Qiu
Weiliang Will Zeng
Hanghang Tong
James Ezick
Christopher Lott
82
15
0
20 Feb 2025
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Bohan Lyu
Siqiao Huang
Zichen Liang
Qi-An Sun
Jiaming Zhang
ELM
LRM
43
0
0
16 Feb 2025
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
Yijia Xiao
Runhui Wang
Luyang Kong
Davor Golac
Wei Wang
LLMAG
54
0
0
10 Feb 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
108
11
0
20 Jan 2025
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Huawen Feng
Pu Zhao
Qingfeng Sun
Can Xu
Fangkai Yang
...
Qianli Ma
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
AAML
ALM
60
0
0
23 Dec 2024
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Fangyu Lei
Jixuan Chen
Yuxiao Ye
Ruisheng Cao
Dongchan Shin
...
Caiming Xiong
Ruoxi Sun
Qian Liu
Sida I. Wang
Tao Yu
LMTD
69
20
0
12 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
59
16
0
07 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code
  Generation Models
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
31
1
0
05 Nov 2024
Mastering the Craft of Data Synthesis for CodeLLMs
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen
Philip Arthur
Qianyu Feng
Cong Duy Vu Hoang
Yu-Heng Hong
...
Mark Johnson
K. K.
Don Dharmasiri
Long Duong
Yuan-Fang Li
SyDa
40
1
0
16 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
34
1
0
09 Oct 2024
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via
  Automated Bash Script Generation, Assessment, and Refinement
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement
Oishik Chatterjee
Pooja Aggarwal
Suranjana Samanta
Ting Dai
P. Mohapatra
...
Ruchi Mahindru
Steve Barbieri
Eugen Postea
Brad Blancett
Arthur De Magalhaes
11
1
0
12 Sep 2024
What can Large Language Models Capture about Code Functional Equivalence?
What can Large Language Models Capture about Code Functional Equivalence?
Nickil Maveli
Antonio Vergari
Shay B. Cohen
21
2
0
20 Aug 2024
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Mengkang Hu
Yixiao Wang
Can Xu
Lingfeng Sun
Chensheng Peng
T. Hannagan
Nicola Poerio
Saravan Rajmohan
LM&Ro
LLMAG
52
14
0
01 Aug 2024
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu
Abhay Puri
Juan A. Rodriguez
Alexandre Drouin
Perouz Taslakian
...
Christopher Pal
Nicolas Chapados
I. Laradji
Sai Rajeswar Mudumba
Issam Hadj Laradji
ELM
37
4
0
08 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
46
4
0
08 Jul 2024
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Mehant Kammakomati
Sameer Pimparkhede
Srikanth G. Tamilselvam
Prince Kumar
Pushpak Bhattacharyya
ALM
27
0
0
03 Jul 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Agentless: Demystifying LLM-based Software Engineering Agents
Chunqiu Steven Xia
Yinlin Deng
Soren Dunn
Lingming Zhang
LLMAG
26
78
0
01 Jul 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
65
125
0
22 Jun 2024
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative
  Models
Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models
Sanjay Vishwakarma
Francis Harkins
Siddharth Golecha
Vishal Sharathchandra Bajpe
Nicolas Dupuis
Luca Buratti
David Kremer
Ismael Faro
Ruchir Puri
Juan Cruz-Benito
ELM
20
3
0
20 Jun 2024
CodeRAG-Bench: Can Retrieval Augment Code Generation?
CodeRAG-Bench: Can Retrieval Augment Code Generation?
Zora Zhiruo Wang
Akari Asai
Xinyan Velocity Yu
Frank F. Xu
Yiqing Xie
Graham Neubig
Daniel Fried
RALM
54
29
0
20 Jun 2024
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang
Chufan Shi
Yaxin Liu
Bo Shui
Junjie Wang
...
Yuxiang Zhang
Gongye Liu
Xiaomei Nie
Deng Cai
Yujiu Yang
MLLM
LRM
38
22
0
14 Jun 2024
DCA-Bench: A Benchmark for Dataset Curation Agents
DCA-Bench: A Benchmark for Dataset Curation Agents
Benhao Huang
Yingzhuo Yu
Jin Huang
Xingjian Zhang
Jiaqi Ma
21
1
0
11 Jun 2024
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Zichao Hu
Junyi Jessy Li
Arjun Guha
Joydeep Biswas
SyDa
ALM
25
1
0
30 May 2024
Kotlin ML Pack: Technical Report
Kotlin ML Pack: Technical Report
Sergey Titov
Mikhail Evtikhiev
Anton Shapkin
Oleg Smirnov
Sergei Boytsov
...
Dariia Karaeva
Maksim Sheptyakov
Mikhail Arkhipov
T. Bryksin
Egor Bogomolov
18
0
0
29 May 2024
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off
  Code Generation
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Aojun Zhou
Junting Pan
Hongsheng Li
SyDa
20
7
0
27 May 2024
PECC: Problem Extraction and Coding Challenges
PECC: Problem Extraction and Coding Challenges
Patrick Haller
Jonas Golde
Alan Akbik
ReLM
14
5
0
29 Apr 2024
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Fang Liu
Yang Liu
Lin Shi
Houkun Huang
Ruifeng Wang
Zhen Yang
Li Zhang
Zhongqi Li
Yuchi Ma
27
103
0
01 Apr 2024
Are LLMs Ready for Real-World Materials Discovery?
Are LLMs Ready for Real-World Materials Discovery?
Santiago Miret
N. M. A. Krishnan
22
25
0
07 Feb 2024
CodeChain: Towards Modular Code Generation Through Chain of
  Self-revisions with Representative Sub-modules
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
Hung Le
Hailin Chen
Amrita Saha
Akash Gokul
Doyen Sahoo
Shafiq R. Joty
LRM
12
41
0
13 Oct 2023
Bias Testing and Mitigation in LLM-based Code Generation
Bias Testing and Mitigation in LLM-based Code Generation
Dong Huang
Qingwen Bu
Jie M. Zhang
Xiaofei Xie
Junjie Chen
Heming Cui
26
20
0
03 Sep 2023
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
Simran Arora
Brandon Yang
Sabri Eyuboglu
A. Narayan
Andrew Hojel
Immanuel Trummer
Christopher Ré
SyDa
38
69
0
19 Apr 2023
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Shuyan Zhou
Uri Alon
Sumit Agarwal
Graham Neubig
ELM
ALM
11
98
0
10 Feb 2023
A Survey on Natural Language Processing for Programming
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
11
1
0
12 Dec 2022
Coder Reviewer Reranking for Code Generation
Coder Reviewer Reranking for Code Generation
Tianyi Zhang
Tao Yu
Tatsunori B. Hashimoto
M. Lewis
Wen-tau Yih
Daniel Fried
Sida I. Wang
12
92
0
29 Nov 2022
A Systematic Evaluation of Large Language Models of Code
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
188
624
0
26 Feb 2022
Training and Evaluating a Jupyter Notebook Data Science Assistant
Training and Evaluating a Jupyter Notebook Data Science Assistant
Shubham Chandel
Colin B. Clement
Guillermo Serrato
Neel Sundaresan
32
43
0
30 Jan 2022
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding
  from Language Models
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
Torsten Scholak
Nathan Schucher
Dzmitry Bahdanau
141
373
0
10 Sep 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
189
614
0
20 May 2021
Memorization vs. Generalization: Quantifying Data Leakage in NLP
  Performance Evaluation
Memorization vs. Generalization: Quantifying Data Leakage in NLP Performance Evaluation
Aparna Elangovan
Jiayuan He
Karin Verspoor
TDI
FedML
150
89
0
03 Feb 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
1