Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.10335
Cited By
Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
20 August 2023
Li Zhong
Zilong Wang
ELM
SILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation"
32 / 32 papers shown
Title
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering
Joshua Owotogbe
LLMAG
57
0
0
06 May 2025
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
Yunseo Lee
John Youngeun Song
Dongsun Kim
Jindae Kim
Mijung Kim
Jaechang Nam
HILM
LRM
35
0
0
29 Apr 2025
A Survey of AI Agent Protocols
Y. Yang
Huacan Chai
Y. Song
S. Qi
Muning Wen
...
Gaowei Chang
W. Liu
Ying Wen
Yong Yu
W. Zhang
LLMAG
66
1
0
23 Apr 2025
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Miracle Master
Rainy Sun
Anya Reese
Joey Ouyang
Alex Chen
...
James Yi
Garry Zhao
Tony Ling
Hobert Wong
Lowes Yang
ALM
ELM
72
0
0
18 Apr 2025
A Convex formulation for linear discriminant analysis
Sai Vijay Kumar Surineela
Prathyusha Kanakamalla
Harigovind Harikumar
Tomojit Ghosh
54
0
0
17 Mar 2025
Interacting with AI Reasoning Models: Harnessing "Thoughts" for AI-Driven Software Engineering
Christoph Treude
Raula Gaikovina Kula
LRM
31
0
0
01 Mar 2025
Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs
Reham Omar
Omij Mangukiya
Essam Mansour
37
0
0
20 Jan 2025
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
Ziyao Zhang
Yanlin Wang
Chong Wang
Jiachi Chen
Zibin Zheng
114
14
0
20 Jan 2025
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code
Nan Jiang
Qi Li
Lin Tan
Tianyi Zhang
HILM
29
1
0
13 Oct 2024
APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls
Weiheng Bai
Keyang Xuan
Pengxiang Huang
Qiushi Wu
Jianing Wen
Jingjing Wu
Kangjie Lu
LLMAG
KELM
27
1
0
25 Sep 2024
A Disguised Wolf Is More Harmful Than a Toothless Tiger: Adaptive Malicious Code Injection Backdoor Attack Leveraging User Behavior as Triggers
Shangxi Wu
Jitao Sang
SILM
AAML
21
1
0
19 Aug 2024
Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks
Jiawei Zhao
Kejiang Chen
Xiaojian Yuan
Weiming Zhang
AAML
31
2
0
15 Aug 2024
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation
Zilong Wang
Yuedong Cui
Li Zhong
Zimin Zhang
Da Yin
Bill Yuchen Lin
Jingbo Shang
51
4
0
26 Jul 2024
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Yang Wang
Naomi Saphra
31
10
0
22 Jul 2024
Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing
Rabimba Karanjai
Aftab Hussain
Md Rafiqul Islam Rabin
Lei Xu
Weidong Shi
Mohammad Amin Alipour
62
2
0
06 Jul 2024
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations
Junkai Chen
Zhenhao Li
Xing Hu
Xin Xia
AAML
42
7
0
28 Jun 2024
Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis
Priyanka Nanayakkara
Hyeok Kim
Yifan Wu
Ali Sarvghad
Narges Mahyar
G. Miklau
Jessica Hullman
31
17
0
04 Jun 2024
Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers
Elijah Pelofske
Vincent Urias
L. Liebrock
32
0
0
23 Apr 2024
Language Models Still Struggle to Zero-shot Reason about Time Series
Mike A. Merrill
Mingtian Tan
Vinayak Gupta
Tom Hartvigsen
Tim Althoff
AI4TS
LRM
40
27
0
17 Apr 2024
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Mureja Argaw
Seunghyun Yoon
Fabian Caba Heilbron
Hanieh Deilamsalehy
Trung Bui
Zhaowen Wang
Franck Dernoncourt
Joon Son Chung
41
9
0
04 Apr 2024
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
David Brandfonbrener
Simon Henniger
Sibi Raja
Tarun Prasad
Chloe Loughridge
...
Sabrina Ruixin Hu
Jianang Yang
William E. Byrd
Robert Zinkov
Nada Amin
LRM
48
5
0
13 Feb 2024
Ocassionally Secure: A Comparative Analysis of Code Generation Assistants
Ran Elgedawy
John Sadik
Senjuti Dutta
Anuj Gautam
Konstantinos Georgiou
Farzin Gholamrezae
Fujiao Ji
Kyungchan Lim
Qian Liu
Scott Ruoti
19
7
0
01 Feb 2024
QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners
Rui Xiao
Lu Han
Xiaoying Zhou
Jiong Wang
Na Zong
Pengyu Zhang
AI4Ed
26
1
0
30 Jan 2024
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models
Shuai Wang
Liang Ding
Li Shen
Yong Luo
Bo Du
Dacheng Tao
ELM
ALM
43
2
0
12 Jan 2024
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Manish P Bhatt
Sahana Chennabasappa
Cyrus Nikolaidis
Shengye Wan
Ivan Evtimov
...
Aleksandar Straumann
Gabriel Synnaeve
Varun Vontimitta
Spencer Whitman
Joshua Saxe
ELM
18
66
0
07 Dec 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
Sergey Levine
Anca Dragan
OffRL
LLMAG
39
23
0
09 Nov 2023
From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude
S. Roy
Poojitha Thota
Krishna Vamsi Naragam
Shirin Nilizadeh
SILM
46
16
0
29 Oct 2023
Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation
Xunxin Cai
Meng Xiao
Zhiyuan Ning
Yuanchun Zhou
33
12
0
09 Oct 2023
LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation
Christian Munley
Aaron Jarmusch
Sunita Chandrasekaran
27
16
0
08 Oct 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
178
787
0
02 May 2023
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
202
631
0
26 Feb 2022
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
194
624
0
20 May 2021
1