v1v2 (latest)

Code Roulette: How Prompt Variability Affects LLM Code Generation

11 June 2025

ArXiv (abs)PDF HTML Github

Main:7 Pages

6 Figures

Bibliography:3 Pages

Appendix:1 Pages

Abstract

Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of generated programs depends on the quality of given prompts. Specifically, functionality and quality of generated code can be sensitive to user's background and familiarity with software development. It is therefore important to quantify LLM's sensitivity to variations in the input. To this end we propose an evaluation pipeline for LLM code generation with a focus on measuring sensitivity to prompt augmentations, completely agnostic to a specific programming tasks and LLMs, and thus widely applicable. We provide extensive experimental evidence illustrating utility of our method and share our code for the benefit of the community.

View on arXiv

Comments on this paper