Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.14856
Cited By
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
28 January 2025
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code"
15 / 15 papers shown
Title
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
44
0
0
03 Apr 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
11
4
0
20 Oct 2024
Knowledge-based Consistency Testing of Large Language Models
Sai Sathiesh Rajan
E. Soremekun
Sudipta Chattopadhyay
14
1
0
03 Jul 2024
Test Code Generation for Telecom Software Systems using Two-Stage Generative Model
Mohamad Nabeel
Doumitrou Daniil Nimara
Tahar Zanouda
16
2
0
14 Apr 2024
Bugs in Large Language Models Generated Code: An Empirical Study
Florian Tambon
Arghavan Moradi Dakhel
Amin Nikanjam
Foutse Khomh
Michel C. Desmarais
G. Antoniol
ELM
14
17
0
13 Mar 2024
Position: AI Evaluation Should Learn from How We Test Humans
Yan Zhuang
Q. Liu
Yuting Ning
Wei Huang
Rui Lv
Zhenya Huang
Guanhao Zhao
Zheng-Wei Zhang
ELM
ALM
62
21
0
18 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
161
388
0
02 May 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
160
320
0
06 Oct 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
270
0
03 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELM
ALM
188
624
0
26 Feb 2022
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
189
614
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
183
853
0
09 Feb 2021
Language GANs Falling Short
Massimo Caccia
Lucas Page-Caccia
W. Fedus
Hugo Larochelle
Joelle Pineau
Laurent Charlin
112
214
0
06 Nov 2018
1