Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 4,486 papers shown
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
437
153
0
25 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELM
LRM
237
52
0
25 Jul 2023
Evaluating Large Language Models for Radiology Natural Language Processing
Zheng Liu
Tianyang Zhong
Yiwei Li
Yutong Zhang
Yirong Pan
...
Shijie Zhao
Hongtu Zhu
Hongtu Zhu
Dinggang Shen
Tianming Liu
LM&MA
ELM
577
6
0
25 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
International Conference on Learning Representations (ICLR), 2023
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
581
319
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
470
205
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
International Conference on Learning Representations (ICLR), 2023
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
593
150
0
20 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
International Conference on Machine Learning (ICML), 2023
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Luke Huan
Wei Wang
ELM
LRM
422
176
0
20 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
320
34
0
20 Jul 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Findings (Findings), 2023
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
309
29
0
19 Jul 2023
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
272
98
0
19 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
323
141
0
18 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
8.7K
15,551
0
18 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Wanrong Zhu
Heng-Chiao Huang
Hongxia Jin
ALM
424
261
0
17 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
International Conference on Learning Representations (ICLR), 2023
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
289
56
0
17 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLM
LRM
248
313
0
17 Jul 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
European Conference on Technology Enhanced Learning (EC-TEL), 2023
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
241
43
0
16 Jul 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
International Conference on Language Resources and Evaluation (LREC), 2023
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQ
LRM
274
45
0
16 Jul 2023
Large Language Models as Superpositions of Cultural Perspectives
Grgur Kovač
Masataka Sawayama
Rémy Portelas
Cédric Colas
Peter Ford Dominey
Pierre-Yves Oudeyer
LLMAG
300
51
0
15 Jul 2023
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACV
SILM
360
76
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
898
1,229
0
12 Jul 2023
Instruction Mining: When Data Mining Meets Large Language Model Finetuning
Yihan Cao
Yanbin Kang
Chi Wang
Lichao Sun
ALM
127
0
0
12 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ester Hlavnova
Sebastian Ruder
241
8
0
11 Jul 2023
OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning
O. Palagin
Vladislav Kaverinskiy
Anna Litvin
Kyrylo S. Malakhov
KELM
90
33
0
11 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
432
156
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
706
2,839
0
06 Jul 2023
Style Over Substance: Evaluation Biases for Large Language Models
International Conference on Computational Linguistics (COLING), 2023
Minghao Wu
Alham Fikri Aji
ALM
ELM
637
63
0
06 Jul 2023
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Waseem Alshikh
Manhal Daaboul
K. Goddard
Brock Imel
Kiran Kamble
Parikshit Kulkarni
M. Russak
ALM
46
15
0
05 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
159
21
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Conference on Robot Learning (CoRL), 2023
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
504
310
0
04 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care
Neural Information Processing Systems (NeurIPS), 2023
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
322
8
0
04 Jul 2023
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions
Sameera Horawalavithana
Sai Munikoti
Ian Stewart
Henry Kvinge
MLLM
163
26
0
03 Jul 2023
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
740
181
0
01 Jul 2023
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
Tianwen Wei
Jian Luan
Wen Liu
Shuang Dong
Bin Wang
ELM
185
62
0
29 Jun 2023
On the Exploitability of Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Manli Shu
Zhenghao Hu
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
388
128
0
28 Jun 2023
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes
IEEE International Conference on Robotics and Automation (ICRA), 2023
Ninad Khargonkar
Sai Haneesh Allu
Ya Lu
Jishnu Jaykumar
Balakrishnan Prabhakaran
Yu Xiang
196
2
0
27 Jun 2023
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Xiao Ma
Swaroop Mishra
Ahmad Beirami
Alex Beutel
Jilin Chen
ELM
ReLM
LRM
180
17
0
25 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALM
ELM
267
26
0
23 Jun 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
International Conference on Learning Representations (ICLR), 2023
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
503
705
0
22 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shizhe Diao
Boyao Wang
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
304
76
0
21 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
International Conference on Learning Representations (ICLR), 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
506
665
0
20 Jun 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
Neural Information Processing Systems (NeurIPS), 2023
S. Naeini
Raeid Saqur
M. Saeidi
John Giorgi
Babak Taati
359
19
0
19 Jun 2023
Toward the Cure of Privacy Policy Reading Phobia: Automated Generation of Privacy Nutrition Labels From Privacy Policies
Shidong Pan
Thong Hoang
Dawen Zhang
Zhenchang Xing
Xiwei Xu
Qinghua Lu
Mark Staples
266
22
0
19 Jun 2023
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses
International Computing Education Research Workshop (ICER), 2023
Jaromír Šavelka
Arav Agarwal
Marshall An
Chris Bogart
M. Sakr
ELM
270
132
0
15 Jun 2023
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
287
183
0
15 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
International Conference on Learning Representations (ICLR), 2023
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELM
ALM
336
86
0
15 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
312
232
0
15 Jun 2023
CMMLU: Measuring massive multitask language understanding in Chinese
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinyan Su
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
447
420
0
15 Jun 2023
Domain-specific ChatBots for Science using Embeddings
Digital Discovery (DD), 2023
Kevin G. Yager
175
16
0
15 Jun 2023
Revealing the structure of language model capabilities
Ryan Burnell
Hank Hao
Andrew R. A. Conway
José Hernández-Orallo
ELM
177
29
0
14 Jun 2023
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Arnav Chavan
Zhuang Liu
D. K. Gupta
Eric P. Xing
Zhiqiang Shen
320
110
0
13 Jun 2023
Previous
1
2
3
...
85
86
87
88
89
90
Next
Page 86 of 90
Page
of 90
Go