ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
437
153
0
25 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELMLRM
237
52
0
25 Jul 2023
Evaluating Large Language Models for Radiology Natural Language Processing
Evaluating Large Language Models for Radiology Natural Language Processing
Zheng Liu
Tianyang Zhong
Yiwei Li
Yutong Zhang
Yirong Pan
...
Shijie Zhao
Hongtu Zhu
Hongtu Zhu
Dinggang Shen
Tianming Liu
LM&MAELM
577
6
0
25 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisInternational Conference on Learning Representations (ICLR), 2023
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&RoLLMAG
581
319
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELMALM
470
205
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill SetsInternational Conference on Learning Representations (ICLR), 2023
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
593
150
0
20 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Luke Huan
Wei Wang
ELMLRM
422
176
0
20 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
320
34
0
20 Jul 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset
  Collection for Conversational AI
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AIFindings (Findings), 2023
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
309
29
0
19 Jul 2023
CValues: Measuring the Values of Chinese Large Language Models from
  Safety to Responsibility
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALMELM
272
98
0
19 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple
  Choice Capabilities in Chinchilla
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
323
141
0
18 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
8.7K
15,551
0
18 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Wanrong Zhu
Heng-Chiao Huang
Hongxia Jin
ALM
424
261
0
17 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
COLLIE: Systematic Construction of Constrained Text Generation TasksInternational Conference on Learning Representations (ICLR), 2023
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
289
56
0
17 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLMLRM
248
313
0
17 Jul 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and
  Rule-Based Methods
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsEuropean Conference on Technology Enhanced Learning (EC-TEL), 2023
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
241
43
0
16 Jul 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An
  Empirical Study
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical StudyInternational Conference on Language Resources and Evaluation (LREC), 2023
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQLRM
274
45
0
16 Jul 2023
Large Language Models as Superpositions of Cultural Perspectives
Large Language Models as Superpositions of Cultural Perspectives
Grgur Kovač
Masataka Sawayama
Rémy Portelas
Cédric Colas
Peter Ford Dominey
Pierre-Yves Oudeyer
LLMAG
300
51
0
15 Jul 2023
Effective Prompt Extraction from Language Models
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACVSILM
360
76
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Lin Wang
OffRL
898
1,229
0
12 Jul 2023
Instruction Mining: When Data Mining Meets Large Language Model
  Finetuning
Instruction Mining: When Data Mining Meets Large Language Model Finetuning
Yihan Cao
Yanbin Kang
Chi Wang
Lichao Sun
ALM
127
0
0
12 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological FeaturesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ester Hlavnova
Sebastian Ruder
241
8
0
11 Jul 2023
OntoChatGPT Information System: Ontology-Driven Structured Prompts for
  ChatGPT Meta-Learning
OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning
O. Palagin
Vladislav Kaverinskiy
Anna Litvin
Kyrylo S. Malakhov
KELM
90
33
0
11 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
432
156
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
706
2,839
0
06 Jul 2023
Style Over Substance: Evaluation Biases for Large Language Models
Style Over Substance: Evaluation Biases for Large Language ModelsInternational Conference on Computational Linguistics (COLING), 2023
Minghao Wu
Alham Fikri Aji
ALMELM
637
63
0
06 Jul 2023
Becoming self-instruct: introducing early stopping criteria for minimal
  instruct tuning
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Waseem Alshikh
Manhal Daaboul
K. Goddard
Brock Imel
Kiran Kamble
Parikshit Kulkarni
M. Russak
ALM
46
15
0
05 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN
  Fine-Tuning
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALMLRM
159
21
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model PlannersConference on Robot Learning (CoRL), 2023
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
504
310
0
04 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity
  and Infant Care
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant CareNeural Information Processing Systems (NeurIPS), 2023
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
322
8
0
04 Jul 2023
SCITUNE: Aligning Large Language Models with Scientific Multimodal
  Instructions
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions
Sameera Horawalavithana
Sai Munikoti
Ian Stewart
Henry Kvinge
MLLM
163
26
0
03 Jul 2023
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MALLMAG
740
181
0
01 Jul 2023
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
Tianwen Wei
Jian Luan
Wen Liu
Shuang Dong
Bin Wang
ELM
185
62
0
29 Jun 2023
On the Exploitability of Instruction Tuning
On the Exploitability of Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Manli Shu
Zhenghao Hu
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
388
128
0
28 Jun 2023
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating
  Replicable Scenes
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable ScenesIEEE International Conference on Robotics and Automation (ICRA), 2023
Ninad Khargonkar
Sai Haneesh Allu
Ya Lu
Jishnu Jaykumar
Balakrishnan Prabhakaran
Yu Xiang
196
2
0
27 Jun 2023
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral
  Reasoning
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Xiao Ma
Swaroop Mishra
Ahmad Beirami
Alex Beutel
Jilin Chen
ELMReLMLRM
180
17
0
25 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language
  Models
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALMELM
267
26
0
23 Jun 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of
  Confidence Elicitation in LLMs
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsInternational Conference on Learning Representations (ICLR), 2023
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
503
705
0
22 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large
  Foundation Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shizhe Diao
Boyao Wang
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
304
76
0
21 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
506
665
0
20 Jun 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative
  Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall DatasetNeural Information Processing Systems (NeurIPS), 2023
S. Naeini
Raeid Saqur
M. Saeidi
John Giorgi
Babak Taati
359
19
0
19 Jun 2023
Toward the Cure of Privacy Policy Reading Phobia: Automated Generation
  of Privacy Nutrition Labels From Privacy Policies
Toward the Cure of Privacy Policy Reading Phobia: Automated Generation of Privacy Nutrition Labels From Privacy Policies
Shidong Pan
Thong Hoang
Dawen Zhang
Zhenchang Xing
Xiwei Xu
Qinghua Lu
Mark Staples
266
22
0
19 Jun 2023
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer
  Struggle to Pass Assessments in Higher Education Programming Courses
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming CoursesInternational Computing Education Research Workshop (ICER), 2023
Jaromír Šavelka
Arav Agarwal
Marshall An
Chris Bogart
M. Sakr
ELM
270
132
0
15 Jun 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
287
183
0
15 Jun 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
KoLA: Carefully Benchmarking World Knowledge of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Jifan Yu
Xiaozhi Wang
Shangqing Tu
S. Cao
Daniel Zhang-Li
...
Lei Hou
Zhiyuan Liu
Bin Xu
Jie Tang
Juanzi Li
ELMALM
336
86
0
15 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELMMLLM
312
232
0
15 Jun 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in ChineseAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinyan Su
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALMELM
447
420
0
15 Jun 2023
Domain-specific ChatBots for Science using Embeddings
Domain-specific ChatBots for Science using EmbeddingsDigital Discovery (DD), 2023
Kevin G. Yager
175
16
0
15 Jun 2023
Revealing the structure of language model capabilities
Revealing the structure of language model capabilities
Ryan Burnell
Hank Hao
Andrew R. A. Conway
José Hernández-Orallo
ELM
177
29
0
14 Jun 2023
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Arnav Chavan
Zhuang Liu
D. K. Gupta
Eric P. Xing
Zhiqiang Shen
320
110
0
13 Jun 2023
Previous
123...858687888990
Next
Page 86 of 90
Pageof 90