Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18486
Cited By
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
29 May 2023
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets"
41 / 41 papers shown
Title
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks
Elie Antoine
Frédéric Béchet
Géraldine Damnati
Philippe Langlais
47
1
0
29 Jan 2025
STAYKATE: Hybrid In-Context Example Selection Combining Representativeness Sampling and Retrieval-based Approach -- A Case Study on Science Domains
Chencheng Zhu
Kazutaka Shimada
Tomoki Taniguchi
Tomoko Ohkuma
28
0
0
31 Dec 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
34
1
0
26 Oct 2024
Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding
Renato Vukovic
David Arps
Carel van Niekerk
Benjamin Matthias Ruppik
Hsien-chin Lin
Michael Heck
Milica Gašić
31
1
0
05 Aug 2024
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Marco AF Pimentel
Clément Christophe
Tathagata Raha
Prateek Munjal
Praveen K Kanithi
Shadab Khan
ELM
16
2
0
29 Jul 2024
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
32
29
0
06 Jun 2024
Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model
Zita Lifelo
Huansheng Ning
Sahraoui Dhelim
AI4MH
27
0
0
13 Apr 2024
What is different between these datasets?
Varun Babbar
Zhicheng Guo
Cynthia Rudin
25
1
0
08 Mar 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
54
76
0
05 Mar 2024
Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy
P. Schoenegger
Indre Tuminauskaite
Peter S. Park
Rafael Valdece Sousa Bastos
P. Tetlock
16
24
0
29 Feb 2024
(Chat)GPT v BERT: Dawn of Justice for Semantic Change Detection
Francesco Periti
Haim Dubossarsky
Nina Tahmasebi
AI4MH
8
13
0
25 Jan 2024
Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences
Hongyi Liu
Qingyun Wang
Payam Karisani
Heng Ji
8
1
0
19 Jan 2024
The Earth is Flat? Unveiling Factual Errors in Large Language Models
Wenxuan Wang
Juluan Shi
Zhaopeng Tu
Youliang Yuan
Jen-tse Huang
Wenxiang Jiao
Michael R. Lyu
KELM
HILM
SyDa
17
1
0
01 Jan 2024
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages
Chiyu Zhang
Khai Duy Doan
Qisheng Liao
Muhammad Abdul-Mageed
16
6
0
23 Oct 2023
Empirical Study of Zero-Shot NER with ChatGPT
Tingyu Xie
Qi Li
Jian Zhang
Yan Zhang
Zuozhu Liu
Hongwei Wang
LRM
ReLM
6
62
0
16 Oct 2023
ChatGPT & Mechanical Engineering: Examining performance on the FE Mechanical Engineering and Undergraduate Exams
Matthew Frenkel
Hebah Emara
8
2
0
26 Sep 2023
The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant
Jingqing Zhang
Kai Sun
A. Jagadeesh
Mahta Ghahfarokhi
Deepa Gupta
Ashok Gupta
Vibhor Gupta
Yike Guo
LM&MA
AI4MH
ELM
6
11
0
16 Jul 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
163
388
0
02 May 2023
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Hamza Kheddar
Yassine Himeur
S. Al-Maadeed
Abbes Amira
F. Bensaali
20
75
0
27 Apr 2023
Industrial Engineering with Large Language Models: A case study of ChatGPT's performance on Oil & Gas problems
O. Ogundare
S. Madasu
N. Wiggins
LLMAG
AI4CE
30
12
0
27 Apr 2023
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
Shangqing Tu
Chunyang Li
Jifan Yu
Xiaozhi Wang
Lei Hou
Juanzi Li
LLMAG
AI4MH
72
11
0
27 Apr 2023
ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking about
Aman Rangapur
Haoran Wang
AI4MH
23
3
0
06 Apr 2023
Towards Making the Most of ChatGPT for Machine Translation
Keqin Peng
Liang Ding
Qihuang Zhong
Li Shen
Xuebo Liu
Min Zhang
Y. Ouyang
Dacheng Tao
LRM
79
132
0
24 Mar 2023
Can we trust the evaluation on ChatGPT?
Rachith Aiyappa
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
ELM
ALM
LLMAG
AI4MH
LRM
101
76
0
22 Mar 2023
A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability
Aiwei Liu
Xuming Hu
Lijie Wen
Philip S. Yu
LMTD
AI4MH
56
143
0
12 Mar 2023
Do large language models resemble humans in language use?
Zhenguang G. Cai
Xufeng Duan
David A. Haslett
Shuqi Wang
M. Pickering
ALM
67
37
0
10 Mar 2023
Ask and You Shall Receive (a Graph Drawing): Testing ChatGPT's Potential to Apply Graph Layout Algorithms
Sara Di Bartolomeo
Giorgio Severi
V. Schetinger
Cody Dunne
28
8
0
03 Mar 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
162
320
0
06 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
237
840
0
05 Oct 2022
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Xue-Yong Fu
Cheng Chen
Md Tahmid Rahman Laskar
TN ShashiBhushan
Simon Corston-Oliver
12
6
0
27 Sep 2022
ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking
Tom Ayoola
Shubhi Tyagi
Joseph Fisher
Christos Christodoulopoulos
Andrea Pierleoni
37
63
0
08 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach
Victor Sanh
Zheng-Xin Yong
Albert Webson
Colin Raffel
...
Khalid Almubarak
Xiangru Tang
Dragomir R. Radev
Mike Tian-Jian Jiang
Alexander M. Rush
VLM
212
335
0
02 Feb 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning
Yixuan Su
Fangyu Liu
Zaiqiao Meng
Tian Lan
Lei Shu
Ehsan Shareghi
Nigel Collier
120
50
0
07 Nov 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
183
1,098
0
09 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Michael A. Hedderich
Lukas Lange
Heike Adel
Jannik Strötgen
Dietrich Klakow
191
283
0
23 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
167
3,504
0
10 Jun 2015
1