Training and Evaluating a Jupyter Notebook Data Science Assistant

30 January 2022

Papers citing "Training and Evaluating a Jupyter Notebook Data Science Assistant"

25 / 25 papers shown

Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

Irene Testini

José Hernández-Orallo

Lorenzo Pacchiardi

229

10 Jun 2025

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality

645

07 Mar 2025

Rigor, Reliability, and Reproducibility Matter: A Decade-Scale Survey of 572 Code Benchmarks

...

708

18 Jan 2025

GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models

225

05 Nov 2024

CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack OverflowAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Nathanael Beau

Benoît Crabbé

305

25 Sep 2024

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

...

Nicolas Chapados

Sai Rajeswar Mudumba

Issam Hadj Laradji

ELM

453

08 Jul 2024

A Survey on Large Language Models for Code Generation

Fan Wang

673

801

01 Jun 2024

MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

623

19 May 2024

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning PipelinesPeerJ Computer Science (PeerJ Comput. Sci.), 2024

Ekaterina Trofimova

Emil Sataev

Andrey E. Ustyuzhanin

391

18 Mar 2024

Capture the Flag: Uncovering Data Insights with Large Language Models

I. Laradji

Perouz Taslakian

Sai Rajeswar

Valentina Zantedeschi

David Vazquez

304

21 Dec 2023

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Weixiang Yan

Haitian Liu

Yunkun Wang

Yunzhe Li

Qian Chen

...

471

14 Nov 2023

Safurai-Csharp: Harnessing Synthetic Data to improve language-specific Code LLM

200

06 Nov 2023

LLM for SoC Security: A Paradigm ShiftIEEE Access (IEEE Access), 2023

440

09 Oct 2023

Safurai 001: New Qualitative Approach for Code LLM Evaluation

211

20 Sep 2023

How Do Analysts Understand and Verify AI-Assisted Data Analyses?International Conference on Human Factors in Computing Systems (CHI), 2023

Chenglong Wang

423

19 Sep 2023

How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz StudyInternational Conference on Human Factors in Computing Systems (CHI), 2023

Ken Gu

Madeleine Grunde-McLaughlin

Andrew M. McNutt

Jeffrey Heer

Tim Althoff

264

18 Sep 2023

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Daoguang Zan

...

267

102

27 Jul 2023

SelfEvolve: A Code Evolution Framework via Large Language Models

Shuyang Jiang

Yuhao Wang

Yu Wang

347

05 Jun 2023

xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Mohammad Abdullah Matin Khan

560

06 Mar 2023

Execution-Based Evaluation for Open-Domain Code GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Zhiruo Wang

Shuyan Zhou

Daniel Fried

Graham Neubig

ELM

377

106

20 Dec 2022

Large Language Models Meet NL2Code: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Daoguang Zan

368

251

19 Dec 2022

Natural Language to Code Generation in Interactive Data Science NotebooksAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

Henryk Michalewski

264

107

19 Dec 2022

DS-1000: A Natural and Reliable Benchmark for Data Science Code GenerationInternational Conference on Machine Learning (ICML), 2022

Ruiqi Zhong

Luke Zettlemoyer

Daniel Fried

376

488

18 Nov 2022

Execution-based Evaluation for Data Science Code Generation Models

Chenglong Wang

303

17 Nov 2022

Fault-Aware Neural Code RankersNeural Information Processing Systems (NeurIPS), 2022

Chenglong Wang

300

04 Jun 2022