ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

International Conference on Learning Representations (ICLR), 2020
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 4,430 papers shown
Title
Integral Transformer: Denoising Attention, Not Too Much Not Too Little
Integral Transformer: Denoising Attention, Not Too Much Not Too Little
I. Kobyzev
Abbas Ghaddar
Dingtao Hu
Boxing Chen
84
0
0
25 Aug 2025
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
Jinwei Gan
Zifeng Cheng
Zhiwei Jiang
Cong Wang
Yafeng Yin
Xiang Luo
Yuchen Fu
Qing Gu
KELMLLMSV
149
1
0
25 Aug 2025
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
Weilin Cai
Le Qin
Shwai He
Junwei Cui
Ang Li
Jiayi Huang
MoE
100
0
0
25 Aug 2025
Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
Robert Yang
MU
136
0
0
25 Aug 2025
Unraveling the cognitive patterns of Large Language Models through module communities
Unraveling the cognitive patterns of Large Language Models through module communities
Kushal Raj Bhandari
Pin-Yu Chen
Jianxi Gao
72
0
0
25 Aug 2025
Module-Aware Parameter-Efficient Machine Unlearning on Transformers
Module-Aware Parameter-Efficient Machine Unlearning on Transformers
Wenjie Bao
Jian Lou
Yuke Hu
Xiaochen Li
Zhihao Liu
Jiaqi Liu
Zhan Qin
K. Ren
MU
76
0
0
24 Aug 2025
From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users
From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users
Sadia Sultana Chowa
Riasad Alvi
Subhey Sadi Rahman
M. R
M. R
M. Islam
Mukhtar Hussain
Sami Azam
LLMAGLM&RoELM
219
5
0
24 Aug 2025
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
Krishna Teja Chitty-Venkata
Sylvia Howland
Golara Azar
Daria Soboleva
Natalia Vassilieva
Siddhisanket Raskar
M. Emani
V. Vishwanath
MoE
72
1
0
24 Aug 2025
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Hyeong Kyu Choi
Xiaojin Zhu
Yixuan Li
LRM
240
7
0
24 Aug 2025
Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models
Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models
Xudong Han
Junjie Yang
Tianyang Wang
Ziqian Bi
Xinyuan Song
Junfeng Hao
Junhao Song
LM&MAALM
334
4
0
24 Aug 2025
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Yang Zhou
Sunzhu Li
Shunyu Liu
Wenkai Fang
Jiale Zhao
...
Hengtong Lu
Wei Chen
Yan Xie
Mingli Song
Weilong Dai
LRM
196
7
0
23 Aug 2025
Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
Sewon Kim
Jiwon Kim
Seungwoo Shin
Hyejin Chung
Daeun Moon
Yejin Kwon
Hyunsoo Yoon
92
0
0
23 Aug 2025
Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks
Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks
Jack Youstra
Mohammed Mahfoud
Yang Yan
Henry Sleight
Ethan Perez
Mrinank Sharma
AAML
128
2
0
23 Aug 2025
What Matters in Data for DPO?
What Matters in Data for DPO?
Yu Pan
Zhongze Cai
Guanting Chen
Huaiyang Zhong
Chonghuan Wang
200
3
0
23 Aug 2025
QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting
QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting
Nicole Cho
William Watson
Alec Koppel
Sumitra Ganesh
Manuela Veloso
AAML
120
0
0
22 Aug 2025
CEQuest: Benchmarking Large Language Models for Construction Estimation
CEQuest: Benchmarking Large Language Models for Construction Estimation
Y. Wu
L. xilinx Wang
Rui Liu
68
0
0
22 Aug 2025
CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
Zhanming Shen
Hao Chen
Yulei Tang
Shaolin Zhu
Wentao Ye
Xiaomeng Hu
Haobo Wang
Gang Chen
Junbo Zhao
SyDaALM
88
0
0
22 Aug 2025
A Probabilistic Inference Scaling Theory for LLM Self-Correction
A Probabilistic Inference Scaling Theory for LLM Self-Correction
Zhe Yang
Yichang Zhang
Yudong Wang
Ziyao Xu
Junyang Lin
Zhifang Sui
LRM
72
1
0
22 Aug 2025
Consensus Is All You Need: Gossip-Based Reasoning Among Large Language Models
Consensus Is All You Need: Gossip-Based Reasoning Among Large Language Models
Saksham Arora
LLMAGALMLRM
57
0
0
22 Aug 2025
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Yakup Abrek Er
.Ilker Kesen
Gözde Gül Şahin
Aykut Erdem
ELMVLM
119
0
0
22 Aug 2025
RoboBuddy in the Classroom: Exploring LLM-Powered Social Robots for Storytelling in Learning and Integration Activities
RoboBuddy in the Classroom: Exploring LLM-Powered Social Robots for Storytelling in Learning and Integration Activities
Daniel Tozadore
Nur Ertug
Yasmine Chaker
Mortadha Abderrahim
36
0
0
22 Aug 2025
FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline
FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline
Parker Seegmiller
Kartik Mehta
Soumya Saha
Chenyang Tao
Shereen Oraby
Arpit Gupta
Tagyoung Chung
Mohit Bansal
Nanyun Peng
SyDaLRM
68
0
0
22 Aug 2025
Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT
Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT
Rushitha Santhoshi Mamidala
Anshuman Chhabra
Ankur Mali
OffRLLRM
106
0
0
22 Aug 2025
From Confidence to Collapse in LLM Factual Robustness
From Confidence to Collapse in LLM Factual RobustnessConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Alina Fastowski
Bardh Prenkaj
Gjergji Kasneci
HILMAAML
151
0
0
22 Aug 2025
MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian
MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian
Ana-Cristina Rogoz
Radu Tudor Ionescu
Alexandra-Valentina Anghel
Ionut-Lucian Antone-Iordache
Simona Coniac
Andreea-Iuliana Ionescu
LM&MA
40
0
0
22 Aug 2025
Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models
Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models
Guangyu Yang
Jinghong Chen
Jingbiao Mei
Weizhe Lin
Bill Byrne
AAML
88
0
0
22 Aug 2025
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective
Tianyao Shi
Yi Ding
MQ
102
3
0
22 Aug 2025
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
152
10
0
21 Aug 2025
CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
Muchammad Daniyal Kautsar
Afra Majida Hariono
Widyawan
Syukron Abu Ishaq Alfarozi
Kuntpong Woraratpanya
103
0
0
21 Aug 2025
Transduction is All You Need for Structured Data Workflows
Transduction is All You Need for Structured Data Workflows
A. Gliozzo
Naweed Khan
Christodoulos Constantinides
Nandana Mihindukulasooriya
Nahuel Defosse
Gaetano Rossiello
Junkyu Lee
AI4CE
108
1
0
21 Aug 2025
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
Peng Ding
Wen Sun
Dailin Li
Wei Zou
Jiaming Wang
Jiajun Chen
Shujian Huang
93
0
0
21 Aug 2025
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference
Xiaojuan Tang
Fanxu Meng
Pingzhi Tang
Yuxuan Wang
Di Yin
Xing Sun
M. Zhang
154
0
0
21 Aug 2025
Dream 7B: Diffusion Large Language Models
Dream 7B: Diffusion Large Language Models
Jiacheng Ye
Zhihui Xie
Lin Zheng
Lei Li
Zirui Wu
Xin Jiang
Zhenguo Li
Lingpeng Kong
DiffMVLM
548
91
0
21 Aug 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Nvidia
Aarti Basant
Abhijit Khairnar
Abhijit Paithankar
Abhinav Khattar
...
Keith Wyss
Keshav Santhanam
Kezhi Kong
Krzysztof Pawelec
Kumar Anik
LRM
227
0
0
20 Aug 2025
LongRecall: A Structured Approach for Robust Recall Evaluation in Long-Form Text
LongRecall: A Structured Approach for Robust Recall Evaluation in Long-Form Text
MohamamdJavad Ardestani
Ehsan Kamalloo
Davood Rafiei
84
1
0
20 Aug 2025
Credence Calibration Game? Calibrating Large Language Models through Structured Play
Credence Calibration Game? Calibrating Large Language Models through Structured Play
Ke Fang
Tianyi Zhao
Lu Cheng
76
1
0
20 Aug 2025
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Haokun Lin
Haobo Xu
Yichen Wu
Ziyu Guo
Renrui Zhang
Zhichao Lu
Ying Wei
Gang Qu
Zhenan Sun
DiffMMQ
106
7
0
20 Aug 2025
LLMs and Agentic AI in Insurance Decision-Making: Opportunities and Challenges For Africa
LLMs and Agentic AI in Insurance Decision-Making: Opportunities and Challenges For Africa
Graham Hill
JingYuan Gong
Thulani Babeli
Moseli Motsóehli
James Gachomo Wanjiku
92
0
0
20 Aug 2025
ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities
ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities
Wenhan Dong
Zhen Sun
Yuemeng Zhao
Zifan Peng
Jun Wu
...
Xinlei He
Yu Wang
Ruiming Wang
Xinyi Huang
Lei Mo
96
0
0
20 Aug 2025
MATA (māta): Mindful Assessment of the Telugu Abilities of Large Language Models
MATA (māta): Mindful Assessment of the Telugu Abilities of Large Language Models
Chalamalasetti Kranti
Sowmya Vajjala
ELM
80
0
0
19 Aug 2025
LM Agents May Fail to Act on Their Own Risk Knowledge
LM Agents May Fail to Act on Their Own Risk Knowledge
Yuzhi Tang
Tianxiao Li
Elizabeth Li
Chris J. Maddison
Honghua Dong
Yangjun Ruan
LLMAGELM
1.6K
0
0
19 Aug 2025
GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs
GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs
Adrian Marius Dumitran
Alexandra-Mihaela Danila
Angela-Liliana Dumitran
LRM
42
0
0
19 Aug 2025
Prompt Orchestration Markup Language
Yuge Zhang
Nan Chen
Jiahang Xu
Yuqing Yang
VLM
92
2
0
19 Aug 2025
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Tomer Ashuach
Dana Arad
Aaron Mueller
Martin Tutek
Yonatan Belinkov
AAMLMU
132
2
0
19 Aug 2025
Generics and Default Reasoning in Large Language Models
Generics and Default Reasoning in Large Language Models
James Ravi Kirkpatrick
Rachel Katharine Sterken
ReLMLRMELM
102
0
0
19 Aug 2025
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
Keliang Li
Hongze Shen
Hao Shi
Ruibing Hou
Hong Chang
...
Wen Wang
Yiling Wu
Shihong Deng
Shiguang Shan
Xilin Chen
LRM
120
1
0
19 Aug 2025
Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları
Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme FırsatlarıSignal Processing and Communications Applications Conference (SIU), 2025
M. Ali Bayram
Ali Arda Fincan
Ahmet Semih G"um"uş
Banu Diri
Savaş Yıldırım
"Oner Aytaş
ELM
44
0
0
18 Aug 2025
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman
Valentin Hofmann
Ian H. Magnusson
Yuling Gu
Noah A. Smith
Hannaneh Hajishirzi
Kyle Lo
Jesse Dodge
ALM
92
3
0
18 Aug 2025
Hallucinations in medical devices
Hallucinations in medical devices
Jason Granstedt
Prabhat Kc
Rucha Deshpande
Victor Garcia
Aldo Badano
154
1
0
18 Aug 2025
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
Seonglae Cho
Zekun Wu
Adriano Soares Koshiyama
LLMSV
154
0
0
18 Aug 2025
Previous
123...141516...878889
Next