ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,479 papers shown
Title
AgentCDM: Enhancing Multi-Agent Collaborative Decision-Making via ACH-Inspired Structured Reasoning
AgentCDM: Enhancing Multi-Agent Collaborative Decision-Making via ACH-Inspired Structured Reasoning
Xuyang Zhao
Shiwan Zhao
Hualong Yu
Liting Zhang
Qicheng Li
LRMAI4CE
82
2
0
16 Aug 2025
QuarkMed Medical Foundation Model Technical Report
QuarkMed Medical Foundation Model Technical Report
A. Li
Bin Yan
Bingfeng Cai
Chenxi Li
Cunzhong Zhao
...
Xin Shang
Yao Wu
Yu Cao
Zhenxin Ma
Zhuang Jia
MedImLM&MA
171
0
0
16 Aug 2025
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Jinyi Han
Tingyun Li
Shisong Chen
Jie Shi
X. Wang
...
Jiaqing Liang
Xin Lin
Liqian Wen
Zulong Chen
Yanghua Xiao
108
2
0
16 Aug 2025
Personalized Distractor Generation via MCTS-Guided Reasoning Reconstruction
Personalized Distractor Generation via MCTS-Guided Reasoning Reconstruction
Tao Wu
Jingyuan Chen
Wang Lin
Jian Zhan
Mengze Li
Kun Kuang
Fei Wu
AI4EdLRM
307
1
0
15 Aug 2025
Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions
Every 28 Days the AI Dreams of Soft Skin and Burning Stars: Scaffolding AI Agents with Hormones and Emotions
Leigh Levinson
Christopher J. Agostino
52
0
0
15 Aug 2025
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Sylvio Rüdian
Yassin Elsir
Marvin Kretschmer
Sabine Cayrou
Niels Pinkwart
84
0
0
15 Aug 2025
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Mikhail Seleznyov
Mikhail Chaichuk
Gleb Ershov
Alexander Panchenko
Elena Tutubalina
Oleg Somov
119
5
0
15 Aug 2025
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Rui Bao
Nan Xue
Yaping Sun
Zhiyong Chen
74
1
0
15 Aug 2025
Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models
Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models
Monika Jotautaitė
Lucius Caviola
David A. Brewster
Thilo Hagendorff
148
0
0
15 Aug 2025
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Kangyu Wang
Hongliang He
Lin Liu
Ruiqi Liang
Zhenzhong Lan
Jianguo Li
ALMELM
126
0
0
15 Aug 2025
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Xinyan Jiang
L. Zhang
Jiayi Zhang
Qingsong Yang
Guimin Hu
Di Wang
Lijie Hu
LLMSV
371
2
0
14 Aug 2025
Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs
Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs
Xiangqi Jin
Y. Wang
Yifeng Gao
Zichen Wen
Biqing Qi
Dongrui Liu
Linfeng Zhang
LRM
164
7
0
14 Aug 2025
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Pratyush Maini
Pratyush Maini
Vineeth Dorna
Aldo Carranza
Fan Pan
...
Spandan Das
Zhengping Wang
Bogdan Gaza
Ari S. Morcos
Matthew L. Leavitt
SyDa
104
0
0
14 Aug 2025
Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective
Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective
Xuning Yang
Clemens Eppner
Jonathan Tremblay
Dieter Fox
Stan Birchfield
Fabio Ramos
124
2
0
14 Aug 2025
Methodological Framework for Quantifying Semantic Test Coverage in RAG Systems
Methodological Framework for Quantifying Semantic Test Coverage in RAG Systems
Noah Broestl
Adel Nasser Abdalla
Rajprakash Bale
Hersh Gupta
Max Struever
27
0
0
13 Aug 2025
EffiEval: Efficient and Generalizable Model Evaluation via Capability Coverage Maximization
EffiEval: Efficient and Generalizable Model Evaluation via Capability Coverage Maximization
Yaoning Wang
Jiahao Ying
Yixin Cao
Yubo Ma
Yugang Jiang
ELM
32
2
0
13 Aug 2025
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
Birong Pan
Mayi Xu
Qiankun Pi
Jianhao Chen
Yuanyuan Zhu
Ming Zhong
T. Qian
132
0
0
13 Aug 2025
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Bin Hong
Jiayu Liu
Zhenya Huang
Kai Zhang
Mengdi Zhang
LRM
191
0
0
13 Aug 2025
Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
Nouar Aldahoul
Yasir Zaki
LM&MAAI4MHELM
142
0
0
13 Aug 2025
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Sattvik Sahai
Prasoon Goyal
Michael Johnston
Anna Gottardi
Yao Lu
...
Lavina Vaz
Leslie Ball
Maureen Murray
Rahul Gupta
Shankar Ananthakrishna
101
1
0
13 Aug 2025
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo
Shuo Shao
Su Zhang
Lijing Zhou
Yuke Hu
Chenxu Zhao
Zhihao Liu
Zhan Qin
212
4
0
13 Aug 2025
mSCoRe: a $M$ultilingual and Scalable Benchmark for $S$kill-based $Co$mmonsense $Re$asoning
mSCoRe: a MMMultilingual and Scalable Benchmark for SSSkill-based CoCoCommonsense ReReReasoning
Nghia Trung Ngo
Franck Dernoncourt
T. Nguyen
LRM
170
0
0
13 Aug 2025
IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization
IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization
Yuzhuo Bai
Shitong Duan
Muhua Huang
Jing Yao
Zhenghao Liu
Peng Zhang
Tun Lu
Xiaoyuan Yi
Maosong Sun
Xing Xie
140
1
0
12 Aug 2025
AgriGPT: a Large Language Model Ecosystem for Agriculture
AgriGPT: a Large Language Model Ecosystem for Agriculture
Bo Yang
Yu Zhang
Lanfei Feng
Yunkui Chen
J. Zhang
...
Yuxuan Chen
Guijun Yang
Yong He
Runhe Huang
Shijian Li
LLMAGKELM
216
4
0
12 Aug 2025
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
Sekh Mainul Islam
Nadav Borenstein
Siddhesh Pawar
Haeun Yu
Arnav Arora
Isabelle Augenstein
214
0
0
12 Aug 2025
Scaling Up Active Testing to Large Language Models
Scaling Up Active Testing to Large Language Models
Gabrielle Berrada
Jannik Kossen
Muhammed Razzak
Freddie Bickford-Smith
Y. Gal
Tom Rainforth
ALM
148
1
0
12 Aug 2025
Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
Adit Krishnan
Chu Wang
Chris Kong
77
0
0
12 Aug 2025
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Resurrecting the Salmon: Rethinking Mechanistic Interpretability with Domain-Specific Sparse Autoencoders
Charles OÑeill
Mudith Jayasekara
Max Kirkby
97
1
0
12 Aug 2025
SinLlama -- A Large Language Model for Sinhala
SinLlama -- A Large Language Model for SinhalaMoratuwa Engineering Research Conference (MERCon), 2025
H.W.K.Aravinda
Rashad Sirajudeen
Samith Karunathilake
Nisansa de Silva
Surangika Ranathunga
Rishemjit Kaur
LRM
248
1
0
12 Aug 2025
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
Peiji Li
Jiasheng Ye
Yongkang Chen
Yichuan Ma
Zijie Yu
...
Linyang Li
Qipeng Guo
Dahua Lin
Bowen Zhou
Kai Chen
LLMAGALMLRM
124
10
0
12 Aug 2025
A Survey on Training-free Alignment of Large Language Models
A Survey on Training-free Alignment of Large Language Models
Birong Pan
Yongqi Li
Jiasheng Si
Sibo Wei
Mayi Xu
Shen Zhou
Yuanyuan Zhu
Ming Zhong
T. Qian
3DVLM&MA
395
0
0
12 Aug 2025
TiMoE: Time-Aware Mixture of Language Experts
TiMoE: Time-Aware Mixture of Language Experts
Robin Faro
Dongyang Fan
Tamar Alphaidze
Martin Jaggi
MoE
121
1
0
12 Aug 2025
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
Junjie Ye
C. Jiang
Zhengyin Du
Yufei Xu
Xuesong Yao
...
Xiaoran Fan
Qi Zhang
Tao Gui
Xuanjing Huang
Jiecao Chen
KELMOffRL
174
4
0
12 Aug 2025
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Mansi Phute
Ravikumar Balakrishnan
LLMSV
88
0
0
11 Aug 2025
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Haoyuan Wu
Haoxing Chen
Xiaodong Chen
Zhanchao Zhou
Tieyuan Chen
...
Junbo Zhao
Lin Liu
Zhenzhong Lan
Bei Yu
Jianguo Li
MoE
132
4
0
11 Aug 2025
Evaluating Large Language Models as Expert Annotators
Evaluating Large Language Models as Expert Annotators
Yu-Min Tseng
Wei-Lin Chen
Chung-Chi Chen
Hsin-Hsi Chen
136
2
0
11 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
219
4
0
11 Aug 2025
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Zihe Liu
Jiashun Liu
Yancheng He
Weixun Wang
Jiaheng Liu
...
Siran Yang
Jiamang Wang
Yuchi Xu
Bo Zheng
B. Zheng
OffRL
112
23
0
11 Aug 2025
OverFill: Two-Stage Models for Efficient Language Model Decoding
OverFill: Two-Stage Models for Efficient Language Model Decoding
Woojeong Kim
Junxiong Wang
Jing Nathan Yan
Mohamed S. Abdelfattah
Alexander M Rush
100
0
0
11 Aug 2025
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang
Dongryeol Lee
Taegwan Kang
Yongil Kim
Kyomin Jung
AAMLELM
116
1
0
11 Aug 2025
Capabilities of GPT-5 on Multimodal Medical Reasoning
Capabilities of GPT-5 on Multimodal Medical Reasoning
Shansong Wang
Mingzhe Hu
Qiang Li
Mojtaba Safari
Xiaofeng Yang
ELMLM&MAAI4MHLRM
139
30
0
11 Aug 2025
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways
HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways
Cristian Cosentino
Annamaria Defilippo
Marco Dossena
Christopher Irwin
Sara Joubbi
Pietro Lio'
LM&MAAI4MH
132
0
0
10 Aug 2025
Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond
Benchmarking for Domain-Specific LLMs: A Case Study on Academia and Beyond
Rubing Chen
Jiaxin Wu
Jian Wang
Xulu Zhang
Wenqi Fan
Chenghua Lin
Xiao-Yong Wei
Qing Li
ALM
232
0
0
10 Aug 2025
Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach
Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach
Naseem Machlovi
Maryam Saleki
Innocent Ababio
Ruhul Amin
80
4
0
09 Aug 2025
gpt-oss-120b & gpt-oss-20b Model Card
gpt-oss-120b & gpt-oss-20b Model Card
OpenAI
Sandhini Agarwal
Lama Ahmad
Jason Ai
Sam Altman
...
D. Sculley
Harshit Sikchi
Kendal Simon
K. Singhal
Yang Song
LRMVLM
121
244
0
08 Aug 2025
LLM Unlearning Without an Expert Curated Dataset
LLM Unlearning Without an Expert Curated Dataset
Xiaoyuan Zhu
Muru Zhang
Ollie Liu
Robin Jia
Willie Neiswanger
MU
251
1
0
08 Aug 2025
Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
Tomohiro Sawada
Kartik Goyal
MoMe
92
0
0
08 Aug 2025
MyCulture: Exploring Malaysia's Diverse Culture under Low-Resource Language Constraints
MyCulture: Exploring Malaysia's Diverse Culture under Low-Resource Language Constraints
Zhong Ken Hew
Jia Xin Low
Sze Jue Yang
Chee Seng Chan
77
1
0
07 Aug 2025
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Xiaodong Chen
Mingming Ha
Zhenzhong Lan
Jing Zhang
Jianguo Li
MoE
109
0
0
07 Aug 2025
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?
Matteo Prandi
Vincenzo Suriani
Federico Pierucci
Marcello Galisai
Daniele Nardi
Piercosma Bisconti
ELM
102
0
0
07 Aug 2025
Previous
123...161718...888990
Next