ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,430 papers shown
Title
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
MACEval: A Multi-Agent Continual Evaluation Network for Large Models
Z. Chen
Yuze Sun
Yuan Tian
Wenjun Zhang
Guangtao Zhai
ALMELM
161
0
0
12 Nov 2025
Bench360: Benchmarking Local LLM Inference from 360°
Bench360: Benchmarking Local LLM Inference from 360°
Linus Stuhlmann
Mauricio Fadel Argerich
Jonathan Fürst
ELM
77
0
0
12 Nov 2025
Investigating CoT Monitorability in Large Reasoning Models
Investigating CoT Monitorability in Large Reasoning Models
Shu Yang
Junchao Wu
Xilin Gou
X. Wu
Yang Li
Ninhao Liu
Di Wang
LRM
103
0
0
11 Nov 2025
Sentence-Anchored Gist Compression for Long-Context LLMs
Sentence-Anchored Gist Compression for Long-Context LLMs
Dmitrii Tarasov
Elizaveta Goncharova
Kuznetsov Andrey
52
0
0
11 Nov 2025
Training Language Models to Explain Their Own Computations
Training Language Models to Explain Their Own Computations
Belinda Z. Li
Zifan Carl Guo
Vincent Huang
Jacob Steinhardt
Jacob Andreas
LRM
156
1
0
11 Nov 2025
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks
Mingyu Sung
Suhwan Im
Vikas Palakonda
Jae-Mo Kang
72
0
0
11 Nov 2025
DynaAct: Large Language Model Reasoning with Dynamic Action Spaces
DynaAct: Large Language Model Reasoning with Dynamic Action Spaces
Xueliang Zhao
Wei Wu
Jian Guan
Qintong Li
Lingpeng Kong
LRM
227
0
0
11 Nov 2025
SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning
SciAgent: A Unified Multi-Agent System for Generalistic Scientific ReasoningMexican International Conference on Artificial Intelligence (MICAI), 2025
Xuchen Li
Ruitao Wu
Xuanbo Liu
Xukai Wang
Jinbo Hu
...
K. Huang
J. Xu
Haitao Mi
Wentao Zhang
Bin Dong
LLMAGLM&RoLRMAI4CE
654
1
0
11 Nov 2025
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Hyunjae Kim
Jiwoong Sohn
Aidan Gilson
Nicholas Cochran-Caggiano
Serina S Applebaum
...
James Zou
Andrew Taylor
Arman Cohan
Hua Xu
Qingyu Chen
RALMLM&MA
287
1
0
10 Nov 2025
P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats
P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats
Yuzong Chen
Chao Fang
Xilai Dai
Yuheng Wu
Thierry Tambe
Marian Verhelst
Mohamed S. Abdelfattah
163
0
0
10 Nov 2025
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Sai Shridhar Balamurali
Lu Cheng
84
0
0
10 Nov 2025
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
155
0
0
10 Nov 2025
Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
P. Wang
Hongcheng Liu
Yusheng Liao
Ziqing Fan
Yaxin Du
Shuo Tang
Y. Wang
Y Samuel Wang
88
0
0
10 Nov 2025
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Sean McLeish
Ang Li
John Kirchenbauer
Dayal Singh Kalra
Brian Bartoldson
B. Kailkhura
Avi Schwarzschild
Jonas Geiping
Tom Goldstein
Micah Goldblum
208
0
0
10 Nov 2025
MobileLLM-Pro Technical Report
MobileLLM-Pro Technical Report
Patrick Huber
Ernie Chang
Wei Wen
Igor Fedorov
Tarek Elgamal
...
Vikas Chandra
Ahmed Aly
Anuj Kumar
Raghuraman Krishnamoorthi
Adithya Sagar
72
0
0
10 Nov 2025
More Agents Helps but Adversarial Robustness Gap Persists
More Agents Helps but Adversarial Robustness Gap Persists
Khashayar Alavi
Zhastay Yeltay
Lucie Flek
Akbar Karimi
AAML
104
0
0
10 Nov 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Tianhao Peng
Haochen Wang
Yuanxing Zhang
Zekun Wang
Zili Wang
...
Wei Ji
Pengfei Wan
Wenhao Huang
Zhaoxiang Zhang
Jiaheng Liu
ELM
276
1
0
10 Nov 2025
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers
Yilin Jiang
Mingzi Zhang
Xuanyu Yin
Sheng Jin
Suyu Lu
Zuocan Ying
Zengyi Yu
Xiangjie Kong
ELM
96
0
0
10 Nov 2025
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Zhongyang Li
Ziyue Li
Tianyi Zhou
MoEMoMe
547
0
0
10 Nov 2025
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
Tingyu Jiang
Shen Li
Yiyao Song
Lan Zhang
Hualei Zhu
Yuan Zhao
Xiaohang Xu
Kenjiro Taura
Hao Henry Wang
196
1
0
10 Nov 2025
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Xiaonan Luo
Yue Huang
Ping He
Xiangliang Zhang
60
0
0
09 Nov 2025
Mixtures of SubExperts for Large Language Continual Learning
Mixtures of SubExperts for Large Language Continual Learning
Haeyong Kang
CLLKELMMoE
175
0
0
09 Nov 2025
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
Yue Huang
Xiangqi Wang
Xiangliang Zhang
100
0
0
09 Nov 2025
LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
Liya Zhu
Peizhuang Cong
Aowei Ji
Wenya Wu
Jiani Hou
...
Jingzhe Ding
Tong Yang
Z. Wang
Ge Zhang
Wenhao Huang
ALMELM
445
0
0
09 Nov 2025
Chasing Consistency: Quantifying and Optimizing Human-Model Alignment in Chain-of-Thought Reasoning
Chasing Consistency: Quantifying and Optimizing Human-Model Alignment in Chain-of-Thought Reasoning
Boxuan Wang
Z. Li
Xinmiao Huang
Xiaowei Huang
Yi Dong
LRM
56
0
0
09 Nov 2025
Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models
Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models
Mayank Saini
Arit Kumar Bishwas
MoE
86
0
0
09 Nov 2025
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
Jingyu Hu
Shu Yang
Xilin Gong
H. Wang
Weiru Liu
Di Wang
LRM
94
0
0
09 Nov 2025
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading
Shuning Lin
Yifan He
Yitong Chen
MoE
53
0
0
08 Nov 2025
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
DRAGON: Guard LLM Unlearning in Context via Negative Detection and ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Y. Wang
Chris Yuhao Liu
Quan Liu
Jinglong Pang
Wei Wei
Yujia Bao
Yang Liu
MU
283
1
0
08 Nov 2025
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
Saurabh Page
Advait Joshi
S. Sonawane
MoE
108
0
0
08 Nov 2025
Iterative Layer-wise Distillation for Efficient Compression of Large Language Models
Iterative Layer-wise Distillation for Efficient Compression of Large Language Models
Grigory Kovalev
M. Tikhomirov
96
0
0
07 Nov 2025
Steering Language Models with Weight Arithmetic
Steering Language Models with Weight Arithmetic
Constanza Fierro
Fabien Roger
MoMeLLMSV
401
0
0
07 Nov 2025
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Leak@kkk: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Hadi Reisizadeh
Jiajun Ruan
Yiwei Chen
Soumyadeep Pal
Sijia Liu
Mingyi Hong
MU
336
0
0
07 Nov 2025
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at ScaleAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
David Acuna
Chao-Han Huck Yang
Yuntian Deng
Jaehun Jung
Ximing Lu
Prithviraj Ammanabrolu
Hyunwoo J. Kim
Yuan-Hong Liao
Yejin Choi
ReLMOffRLLRM
307
1
0
07 Nov 2025
Motif 2 12.7B technical report
Motif 2 12.7B technical report
Junghwan Lim
S. W. Lee
Dongseok Kim
Taehyun Kim
Eunhwan Park
...
Kungyu Lee
Dongpin Oh
Yeongjae Park
Bokki Ryu
Dongjoo Weon
68
0
0
07 Nov 2025
Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges
Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on EdgesIEEE International Conference on Mobile Adhoc and Sensor Systems (MASS), 2025
Md Romyull Islam
Bobin Deng
Nobel Dhar
Tu N. Nguyen
Selena He
Yong Shi
Kun Suo
92
0
0
07 Nov 2025
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Alex Fang
Thomas Voice
Ruoming Pang
Ludwig Schmidt
Tom Gunter
90
0
0
06 Nov 2025
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
Yushu Zhao
Zheng Wang
Minjia Zhang
MoE
121
0
0
06 Nov 2025
If I Could Turn Back Time: Temporal Reframing as a Historical Reasoning Task for LLMs
If I Could Turn Back Time: Temporal Reframing as a Historical Reasoning Task for LLMs
Lars Bungum
Charles Yijia Huang
Abeer Kashar
101
0
0
06 Nov 2025
An MLCommons Scientific Benchmarks Ontology
An MLCommons Scientific Benchmarks Ontology
B. Hawks
G. V. Laszewski
Matthew D. Sinclair
Marco Colombo
Shivaram Venkataraman
Rutwik Jain
Yiwei Jiang
Nhan Tran
Geoffrey C. Fox
68
1
0
06 Nov 2025
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
Yuantian Shao
Yuanteng Chen
Peisong Wang
Jianlin Yu
Jing Lin
Yiwu Yao
Zhihui Wei
Jian Cheng
MQ
232
0
0
06 Nov 2025
LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal
LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal
Michał Karp
Anna Kubaszewska
Magdalena Król
Robert Król
Aleksander Smywiński-Pohl
Mateusz Szymański
Witold Wydmański
ELM
72
0
0
06 Nov 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Jingqi Tong
Yurong Mou
Hangcheng Li
Mingzhe Li
Y. Yang
...
Y. Zheng
Xinchi Chen
Jun Zhao
Xuanjing Huang
Xipeng Qiu
VGenLRM
301
6
0
06 Nov 2025
From Prompts to Power: Measuring the Energy Footprint of LLM Inference
From Prompts to Power: Measuring the Energy Footprint of LLM Inference
Francisco Caravaca
Ángel Cuevas
R. Cuevas
72
0
0
05 Nov 2025
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Haofei Yu
Fenghai Li
Jiaxuan You
LLMAGLRM
161
1
0
05 Nov 2025
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
Shahriyar Zaman Ridoy
Azmine Toushik Wasi
Koushik Ahamed Tonmoy
LRM
128
0
0
05 Nov 2025
Cache Mechanism for Agent RAG Systems
Cache Mechanism for Agent RAG Systems
Shuhang Lin
Zhencan Peng
Lingyao Li
Xiao Lin
Xi Zhu
Yongfeng Zhang
93
0
0
04 Nov 2025
Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
Y. Hua
Paul Denny
Jorg Wicker
Katerina Taskova
68
0
0
04 Nov 2025
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Liuhao Lin
Ke Li
Zihan Xu
Yuchen Shi
Yulei Qin
Y. Zhang
Xing Sun
Rongrong Ji
144
1
0
04 Nov 2025
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data
TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Changjiang Jiang
Fengchang Yu
H. Chen
Wei Lu
Jin Zeng
LMTDReLM
282
0
0
04 Nov 2025
Previous
12345...878889
Next