ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07830
  4. Cited By
HellaSwag: Can a Machine Really Finish Your Sentence?

HellaSwag: Can a Machine Really Finish Your Sentence?

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
ArXiv (abs)PDFHTML

Papers citing "HellaSwag: Can a Machine Really Finish Your Sentence?"

50 / 2,253 papers shown
BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining
BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining
Jie Hao
Rui Yu
W. Zhang
Huixia Wang
Jie Xu
Mingrui Liu
262
0
0
07 Oct 2025
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang
Xiaolong Tu
Hongyu Ke
Huirong Chai
Dawei Chen
Kyungtae Han
116
1
0
07 Oct 2025
Training Dynamics Impact Post-Training Quantization Robustness
Training Dynamics Impact Post-Training Quantization Robustness
Albert Catalan-Tatjer
Niccolò Ajroldi
Jonas Geiping
MQ
181
0
0
07 Oct 2025
Robustness assessment of large audio language models in multiple-choice evaluation
Robustness assessment of large audio language models in multiple-choice evaluation
F. López
Santosh Kesiraju
Jordi Luque
AuLLMELM
167
0
0
06 Oct 2025
SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba
SpikingMamba: Towards Energy-Efficient Large Language Models via Knowledge Distillation from Mamba
Y. Huang
Jianxiong Tang
Chao Wang
Ziyi Wang
Jianguo Zhang
Zhichao Lu
Bojun Cheng
Luziwei Leng
Mamba
173
2
0
06 Oct 2025
Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation
Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation
Devleena Das
Rajeev Patwari
Ashish Sirasao
106
0
0
06 Oct 2025
The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures
The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures
Alexander Fichtl
Jeremias Bohn
Josefin Kelber
Edoardo Mosca
Georg Groh
132
0
0
06 Oct 2025
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Sara Kangaslahti
Nihal V. Nayak
Jonathan Geuter
Marco Fumero
Francesco Locatello
David Alvarez-Melis
162
0
0
06 Oct 2025
What Makes Diffusion Language Models Super Data Learners?
What Makes Diffusion Language Models Super Data Learners?
Zitian Gao
Haoming Luo
Lynx Chen
Jason Klein Liu
Ran Tao
Joey Zhou
Bryan Dai
88
2
0
05 Oct 2025
Read the Scene, Not the Script: Outcome-Aware Safety for LLMs
Read the Scene, Not the Script: Outcome-Aware Safety for LLMs
Rui Wu
Yihao Quan
Zeru Shi
Zhenting Wang
Yanshu Li
Ruixiang Tang
131
1
0
05 Oct 2025
Measuring Language Model Hallucinations Through Distributional Correctness
Measuring Language Model Hallucinations Through Distributional Correctness
Thomas F Burns
HILMELM
178
0
0
05 Oct 2025
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
Bumjun Kim
Dongjae Jeon
Dueun Kim
Wonje Jeung
Albert No
145
0
0
04 Oct 2025
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models
Sofiane Ennadir
Levente Zólyomi
Oleg Smirnov
Tianze Wang
John Pertoft
Filip Cornell
Lele Cao
143
0
0
02 Oct 2025
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
Kwanhee Lee
Hyeondo Jang
Dongyeop Lee
Dan Alistarh
Namhoon Lee
101
1
0
02 Oct 2025
Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
Xinyuan Song
Keyu Wang
Pengxiang Li
L. Yin
Shiwei Liu
295
2
0
02 Oct 2025
Composer: A Search Framework for Hybrid Neural Architecture Design
Composer: A Search Framework for Hybrid Neural Architecture Design
Bilge Acun
Prasoon Sinha
Newsha Ardalani
Sangmin Bae
Alicia Golden
Chien-Yu Lin
Meghana Madhyastha
Fei Sun
N. Yadwadkar
Carole-Jean Wu
236
1
0
01 Oct 2025
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Yicheng Lang
Yihua Zhang
Chongyu Fan
Changsheng Wang
Jinghan Jia
Sijia Liu
MU
358
0
0
01 Oct 2025
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Shojiro Yamabe
Jun Sakuma
AAML
133
0
0
01 Oct 2025
Sentry: Authenticating Machine Learning Artifacts on the Fly
Sentry: Authenticating Machine Learning Artifacts on the Fly
Andrew Gan
Zahra Ghodsi
81
1
0
01 Oct 2025
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
Yuan Zhuang
Yi Shen
Yuexin Bian
Qing Su
Shihao Ji
Yuanyuan Shi
Fei Miao
MoEMoMe
234
2
0
30 Sep 2025
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
Weiyu Huang
Yuezhou Hu
Jun Zhu
Jianfei Chen
CLL
108
0
0
30 Sep 2025
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
Chuanyang Zheng
Jiankai Sun
Yihang Gao
Enze Xie
Yuehao Wang
...
Kashif Rasul
Mac Schwager
Anderson Schneider
Zinan Lin
Yuriy Nevmyvaka
MoE
222
2
0
30 Sep 2025
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Yaoxiang Wang
Qingguo Hu
Yucheng Ding
Ruizhe Wang
Yeyun Gong
Jian Jiao
Yelong Shen
Peng Cheng
Jinsong Su
MoE
94
1
0
30 Sep 2025
Collaborative Compression for Large-Scale MoE Deployment on Edge
Collaborative Compression for Large-Scale MoE Deployment on Edge
Yixiao Chen
Yanyue Xie
Ruining Yang
Wei Jiang
Wei Wang
Yong He
Yue Chen
Pu Zhao
Y. Wang
MQ
94
0
0
30 Sep 2025
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
Kaizhuo Yan
Yingjie Yu
Yifan Yu
Haizhong Zheng
Fan Lai
VLM
108
0
0
30 Sep 2025
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
Mingjin Li
Yu Liu
Huayi Liu
Xiang Ye
Chao Jiang
Hongguang Zhang
Yu Ruan
228
2
0
30 Sep 2025
Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
Houjun Liu
Shikhar Murty
Christopher D. Manning
Róbert Csordás
ReLMLRMAI4CE
160
1
0
30 Sep 2025
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Junlin Han
Shengbang Tong
David Fan
Yufan Ren
Koustuv Sinha
Juil Sock
Filippos Kokkinos
LRMVLM
207
7
0
30 Sep 2025
Layer-wise dynamic rank for compressing large language models
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
208
0
0
30 Sep 2025
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks
Arda Uzunoglu
Tianjian Li
Daniel Khashabi
172
0
0
30 Sep 2025
Towards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners
Towards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners
Charlotte Li
Nick Hagar
Sachita Nishal
Jeremy Gilbert
Nick Diakopoulos
97
0
0
30 Sep 2025
Short window attention enables long-term memorization
Short window attention enables long-term memorization
Loic Cabannes
Maximilian Beck
Gergely Szilvasy
Matthijs Douze
Maria Lomeli
Jade Copet
Pierre-Emmanuel Mazaré
Gabriel Synnaeve
Hervé Jégou
152
1
0
29 Sep 2025
Conda: Column-Normalized Adam for Training Large Language Models Faster
Conda: Column-Normalized Adam for Training Large Language Models Faster
Junjie Wang
Pan Zhou
Yiming Dong
Huan Li
Jia Li
Xun Zhou
Qicheng Lao
Cong Fang
Zhouchen Lin
AI4CE
246
0
0
29 Sep 2025
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Huu Nguyen
Victor May
Harsh Raj
Marianna Nezhurina
Yishan Wang
...
Aleksandra Krasnodębska
Christoph Schuhmann
Mats Leon Richter
Xuan-Son
J. Jitsev
239
1
0
29 Sep 2025
Pretraining with hierarchical memories: separating long-tail and common knowledge
Pretraining with hierarchical memories: separating long-tail and common knowledge
Hadi Pouransari
David Grangier
C Thomas
Michael Kirchhof
Oncel Tuzel
RALMKELM
249
1
0
29 Sep 2025
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
Jae-Bum Seo
Muhammad Salman
Lismer Andres Caceres-Najarro
103
0
0
29 Sep 2025
AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment
AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment
Mengyu Bu
Shaolei Zhang
Zhongjun He
Hua Wu
Yang Feng
120
0
0
29 Sep 2025
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
M. R
Dan John Velasco
117
1
0
29 Sep 2025
Fingerprinting LLMs via Prompt Injection
Fingerprinting LLMs via Prompt Injection
Yuepeng Hu
Zhengyuan Jiang
Mengyuan Li
Osama Ahmed
Zhicong Huang
Cheng Hong
Neil Zhenqiang Gong
201
0
0
29 Sep 2025
LLM DNA: Tracing Model Evolution via Functional Representations
LLM DNA: Tracing Model Evolution via Functional Representations
Zhaomin Wu
Haodong Zhao
Ziyang Wang
Jizhou Guo
Qian Wang
Bingsheng He
137
2
0
29 Sep 2025
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
Hao Ban
Kaiyi Ji
MoE
174
0
0
29 Sep 2025
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Lucio La Cava
Andrea Tagarelli
LLMSV
163
0
0
28 Sep 2025
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
Haojie Ouyang
Jianwei Lv
Lei Ren
Chen Wei
Xiaojie Wang
Fangxiang Feng
VLM
175
0
0
28 Sep 2025
Assessing Large Language Models in Updating Their Forecasts with New Information
Assessing Large Language Models in Updating Their Forecasts with New Information
Zhangdie Yuan
Zifeng Ding
Andreas Vlachos
92
0
0
28 Sep 2025
Sequential Diffusion Language Models
Sequential Diffusion Language Models
Yangzhou Liu
Yue Cao
Hao-Wen Li
Gen Luo
Z. Chen
...
Yuqiang Li
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
117
7
0
28 Sep 2025
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Zemin Huang
Yuhang Wang
Zhiyang Chen
Guo-Jun Qi
113
4
0
28 Sep 2025
Tequila: Trapping-free Ternary Quantization for Large Language Models
Tequila: Trapping-free Ternary Quantization for Large Language Models
Hong Huang
Decheng Wu
Rui Cen
Guanghua Yu
Z. Li
Kai Liu
Jianchen Zhu
Peng Chen
Xue Liu
Dapeng Wu
MQ
247
2
0
28 Sep 2025
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Taiqiang Wu
Runming Yang
Tao Liu
Jiahao Wang
Zenan Xu
Ngai Wong
117
1
0
28 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
125
0
0
27 Sep 2025
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
Vage Egiazarian
Roberto L. Castro
Denis Kuznedelev
Andrei Panferov
Eldar Kurtic
...
Alexandre Marques
Mark Kurtz
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
243
2
0
27 Sep 2025
Previous
123456...444546
Next
Page 5 of 46
Pageof 46