ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,486 papers shown
SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning
SciAgent: A Unified Multi-Agent System for Generalistic Scientific ReasoningMexican International Conference on Artificial Intelligence (MICAI), 2025
Xuchen Li
Ruitao Wu
Xuanbo Liu
Xukai Wang
Jinbo Hu
...
K. Huang
J. Xu
Haitao Mi
Wentao Zhang
Bin Dong
LLMAGLM&RoLRMAI4CE
754
1
0
11 Nov 2025
Alignment-Aware Quantization for LLM Safety
Alignment-Aware Quantization for LLM Safety
Sunghyun Wee
Suyoung Kim
Hyeonjin Kim
Kyomin Hwang
Nojun Kwak
112
0
0
11 Nov 2025
Training Language Models to Explain Their Own Computations
Training Language Models to Explain Their Own Computations
Belinda Z. Li
Zifan Carl Guo
Vincent Huang
Jacob Steinhardt
Jacob Andreas
LRM
236
3
0
11 Nov 2025
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Zhongyang Li
Ziyue Li
Tianyi Zhou
MoEMoMe
625
0
0
10 Nov 2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
205
0
0
10 Nov 2025
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Sean McLeish
Ang Li
John Kirchenbauer
Dayal Singh Kalra
Brian Bartoldson
B. Kailkhura
Avi Schwarzschild
Jonas Geiping
Tom Goldstein
Micah Goldblum
279
2
0
10 Nov 2025
Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
P. Wang
Hongcheng Liu
Yusheng Liao
Ziqing Fan
Yaxin Du
Shuo Tang
Y. Wang
Y Samuel Wang
133
1
0
10 Nov 2025
P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats
P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats
Yuzong Chen
Chao Fang
Xilai Dai
Yuheng Wu
Thierry Tambe
Marian Verhelst
Mohamed S. Abdelfattah
228
1
0
10 Nov 2025
MobileLLM-Pro Technical Report
MobileLLM-Pro Technical Report
Patrick Huber
Ernie Chang
Wei Wen
Igor Fedorov
Tarek Elgamal
...
Vikas Chandra
Ahmed Aly
Anuj Kumar
Raghuraman Krishnamoorthi
Adithya Sagar
143
0
0
10 Nov 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Tianhao Peng
Haochen Wang
Yuanxing Zhang
Zekun Wang
Zili Wang
...
Wei Ji
Pengfei Wan
Wenhao Huang
Zhaoxiang Zhang
Jiaheng Liu
ELM
379
2
0
10 Nov 2025
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
Tingyu Jiang
Shen Li
Yiyao Song
Lan Zhang
Hualei Zhu
Yuan Zhao
Xiaohang Xu
Kenjiro Taura
Hao Henry Wang
386
3
0
10 Nov 2025
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Hyunjae Kim
Jiwoong Sohn
Aidan Gilson
Nicholas Cochran-Caggiano
Serina S Applebaum
...
James Zou
Andrew Taylor
Arman Cohan
Hua Xu
Qingyu Chen
RALMLM&MA
359
3
0
10 Nov 2025
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Sai Shridhar Balamurali
Lu Cheng
124
1
0
10 Nov 2025
More Agents Helps but Adversarial Robustness Gap Persists
More Agents Helps but Adversarial Robustness Gap Persists
Khashayar Alavi
Zhastay Yeltay
Lucie Flek
Akbar Karimi
AAML
151
0
0
10 Nov 2025
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
Yilin Jiang
Mingzi Zhang
Xuanyu Yin
Sheng Jin
Suyu Lu
Zuocan Ying
Zengyi Yu
Xiangjie Kong
ELM
163
0
0
10 Nov 2025
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Xiaonan Luo
Yue Huang
Ping He
Xiangliang Zhang
100
0
0
09 Nov 2025
Mixtures of SubExperts for Large Language Continual Learning
Mixtures of SubExperts for Large Language Continual Learning
Haeyong Kang
CLLKELMMoE
214
0
0
09 Nov 2025
Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models
Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models
Mayank Saini
Arit Kumar Bishwas
MoE
123
0
0
09 Nov 2025
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
Yue Huang
Xiangqi Wang
Xiangliang Zhang
133
0
0
09 Nov 2025
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
Jingyu Hu
Shu Yang
Xilin Gong
H. Wang
Weiru Liu
Di Wang
LRM
139
2
0
09 Nov 2025
LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
Liya Zhu
Peizhuang Cong
Aowei Ji
Wenya Wu
Jiani Hou
...
Jingzhe Ding
Tong Yang
Z. Wang
Ge Zhang
Wenhao Huang
ALMELM
584
0
0
09 Nov 2025
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
Boxuan Wang
Z. Li
Xinmiao Huang
Xiaowei Huang
Yi Dong
LRM
121
1
0
09 Nov 2025
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading
Shuning Lin
Yifan He
Yitong Chen
MoE
102
0
0
08 Nov 2025
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
DRAGON: Guard LLM Unlearning in Context via Negative Detection and ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Y. Wang
Chris Yuhao Liu
Quan Liu
Jinglong Pang
Wei Wei
Yujia Bao
Yang Liu
MU
356
2
0
08 Nov 2025
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
Saurabh Page
Advait Joshi
S. Sonawane
MoE
138
0
0
08 Nov 2025
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Leak@kkk: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Hadi Reisizadeh
Jiajun Ruan
Yiwei Chen
Soumyadeep Pal
Sijia Liu
Mingyi Hong
MU
362
0
0
07 Nov 2025
Steering Language Models with Weight Arithmetic
Steering Language Models with Weight Arithmetic
Constanza Fierro
Fabien Roger
MoMeLLMSV
532
0
0
07 Nov 2025
Iterative Layer-wise Distillation for Efficient Compression of Large Language Models
Iterative Layer-wise Distillation for Efficient Compression of Large Language Models
Grigory Kovalev
M. Tikhomirov
108
0
0
07 Nov 2025
Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges
Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on EdgesIEEE International Conference on Mobile Adhoc and Sensor Systems (MASS), 2025
Md Romyull Islam
Bobin Deng
Nobel Dhar
Tu N. Nguyen
Selena He
Yong Shi
Kun Suo
145
0
0
07 Nov 2025
Motif 2 12.7B technical report
Motif 2 12.7B technical report
Junghwan Lim
S. W. Lee
Dongseok Kim
Taehyun Kim
Eunhwan Park
...
Kungyu Lee
Dongpin Oh
Yeongjae Park
Bokki Ryu
Dongjoo Weon
104
0
0
07 Nov 2025
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at ScaleAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
David Acuna
Chao-Han Huck Yang
Yuntian Deng
Jaehun Jung
Ximing Lu
Prithviraj Ammanabrolu
Hyunwoo J. Kim
Yuan-Hong Liao
Yejin Choi
ReLMOffRLLRM
344
1
0
07 Nov 2025
If I Could Turn Back Time: Temporal Reframing as a Historical Reasoning Task for LLMs
If I Could Turn Back Time: Temporal Reframing as a Historical Reasoning Task for LLMs
Lars Bungum
Charles Yijia Huang
Abeer Kashar
136
0
0
06 Nov 2025
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
Yushu Zhao
Zheng Wang
Minjia Zhang
MoE
161
1
0
06 Nov 2025
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Alex Fang
Thomas Voice
Ruoming Pang
Ludwig Schmidt
Tom Gunter
106
0
0
06 Nov 2025
LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal
LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal
Michał Karp
Anna Kubaszewska
Magdalena Król
Robert Król
Aleksander Smywiński-Pohl
Mateusz Szymański
Witold Wydmański
ELM
104
0
0
06 Nov 2025
An MLCommons Scientific Benchmarks Ontology
An MLCommons Scientific Benchmarks Ontology
B. Hawks
G. V. Laszewski
Matthew D. Sinclair
Marco Colombo
Shivaram Venkataraman
Rutwik Jain
Yiwei Jiang
Nhan Tran
Geoffrey C. Fox
96
1
0
06 Nov 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Jingqi Tong
Yurong Mou
Hangcheng Li
Mingzhe Li
Y. Yang
...
Y. Zheng
Xinchi Chen
Jun Zhao
Xuanjing Huang
Xipeng Qiu
VGenLRM
351
10
0
06 Nov 2025
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
Yuantian Shao
Yuanteng Chen
Peisong Wang
Jianlin Yu
Jing Lin
Yiwu Yao
Zhihui Wei
Jian Cheng
MQ
365
1
0
06 Nov 2025
From Prompts to Power: Measuring the Energy Footprint of LLM Inference
From Prompts to Power: Measuring the Energy Footprint of LLM Inference
Francisco Caravaca
Ángel Cuevas
R. Cuevas
119
0
0
05 Nov 2025
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Haofei Yu
Fenghai Li
Jiaxuan You
LLMAGLRM
238
3
0
05 Nov 2025
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
Shahriyar Zaman Ridoy
Azmine Toushik Wasi
Koushik Ahamed Tonmoy
LRM
177
0
0
05 Nov 2025
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything
Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything
Huawei Lin
Yunzhi Shi
Tong Geng
Weijie Zhao
Wei Wang
Ravender Pal Singh
LLMAGVLMLRM
258
0
0
04 Nov 2025
Cache Mechanism for Agent RAG Systems
Cache Mechanism for Agent RAG Systems
Shuhang Lin
Zhencan Peng
Lingyao Li
Xiao Lin
Xi Zhu
Yongfeng Zhang
135
2
0
04 Nov 2025
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Changjiang Jiang
Fengchang Yu
H. Chen
Wei Lu
Jin Zeng
LMTDReLM
392
0
0
04 Nov 2025
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
Drago Plečko
Patrik Okanovic
Shreyas Havaldar
Torsten Hoefler
Elias Bareinboim
193
1
0
04 Nov 2025
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Liuhao Lin
Ke Li
Zihan Xu
Yuchen Shi
Yulei Qin
Y. Zhang
Xing Sun
Rongrong Ji
208
1
0
04 Nov 2025
DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning
DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning
Lachlan McPheat
Navdeep Kaur
Robert E Blackwell
Alessandra Russo
Anthony G Cohn
Pranava Madhyastha
ReLMCoGeLRM
285
0
0
04 Nov 2025
Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
Y. Hua
Paul Denny
Jorg Wicker
Katerina Taskova
118
0
0
04 Nov 2025
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
Ehsan Aghazadeh
Ahmad Ghasemi
Hedyeh Beyhaghi
Hossein Pishro-Nik
LRM
160
0
0
04 Nov 2025
A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains
A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains
Greta Ontrup
Annika Bush
Markus Pauly
Meltem Aksoy
127
0
0
03 Nov 2025
Previous
123456...888990
Next
Page 3 of 90
Pageof 90