Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.07830
Cited By
HellaSwag: Can a Machine Really Finish Your Sentence?
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HellaSwag: Can a Machine Really Finish Your Sentence?"
50 / 2,243 papers shown
Title
Estonian Native Large Language Model Benchmark
Helena Grete Lillepalu
Tanel Alumäe
ELM
68
0
0
24 Oct 2025
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor
Victor Lu
Vassil Tashev
Armstrong Foundjem
Aishwarya Ramasethu
...
Chris Knotz
Kongtao Chen
Alicia Parrish
Anka Reuel
Heather Frase
105
0
0
24 Oct 2025
Context-level Language Modeling by Learning Predictive Context Embeddings
Beiya Dai
Y. Liu
Daozheng Xue
Qipeng Guo
Kai Chen
Xinbing Wang
Bowen Zhou
Zhouhan Lin
LRM
123
0
0
23 Oct 2025
Latent Space Factorization in LoRA
Shashi Kumar
Yacouba Kaloga
John Mitros
P. Motlícek
Ina Kodrasi
84
0
0
22 Oct 2025
Zhyper: Factorized Hypernetworks for Conditioned LLM Fine-Tuning
M. H. I. Abdalla
Zhipin Wang
Christian M. M. Frey
Steffen Eger
Josif Grabocka
131
0
0
22 Oct 2025
DiSRouter: Distributed Self-Routing for LLM Selections
Hang Zheng
Hongshen Xu
Yongkai Lin
Shuai Fan
Lu Chen
Kai Yu
107
1
0
22 Oct 2025
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Mingyu Jo
Jaesik Yoon
Justin Deschenaux
Çağlar Gülçehre
Sungjin Ahn
DiffM
180
0
0
22 Oct 2025
ELUTQ: Efficient LUT-Aware Quantization for Deploying Large Language Models on Edge Devices
Xin Nie
Liang Dong
H. Zhang
JiaWang Xiao
G. Sun
MQ
356
0
0
22 Oct 2025
Restoring Pruned Large Language Models via Lost Component Compensation
Zijian Feng
Hanzhang Zhou
Zixiao Zhu
Tianjiao Li
Jia Jim Deryl Chua
Lee Onn Mak
Gee Wah Ng
Kezhi Mao
121
0
0
22 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
120
0
0
22 Oct 2025
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters
Anand Choudhary
Yasser Sulaıman
Lukas Mauch
G. B. Hacene
Fabien Cardinaux
Antoine Bosselut
108
0
0
22 Oct 2025
CPSVD: Enhancing Large Language Model Compression via Column-Preserving Singular Value Decomposition
Lin Xv
Jingsheng Gao
Xian Gao
Ting Li
Yuzhuo Fu
48
0
0
22 Oct 2025
What is the Best Sequence Length for BABYLM?
Suchir Salhan
Richard Diehl Martinez
Zébulon Goriely
P. Buttery
84
1
0
22 Oct 2025
ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression
Lin Xv
Jingsheng Gao
Xian Gao
Ting Liu
Yuzhuo Fu
88
0
0
22 Oct 2025
Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection
Hongyi He
Xiao Liu
Zhenghao Lin
Mingni Tang
Y. Cheng
Jintao Wang
W. Li
Peng Cheng
Yeyun Gong
OODD
153
0
0
21 Oct 2025
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
Kyo Kuroki
Yasuyuki Okoshi
Thiem Van Chu
Kazushi Kawamura
Masato Motomura
MQ
160
0
0
21 Oct 2025
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
S. Bian
Tao Yu
Shivaram Venkataraman
Youngsuk Park
86
0
0
21 Oct 2025
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
Xiaohan Qin
Xiaoxing Wang
Ning Liao
Cancheng Zhang
Xiangdong Zhang
Mingquan Feng
Jingzhi Wang
Junchi Yan
122
0
0
21 Oct 2025
ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
Zhiwei Hao
Jianyuan Guo
Li Shen
Kai Han
Yehui Tang
Han Hu
Yunhe Wang
187
0
0
21 Oct 2025
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning
Zhi Zhang
Yixian Shen
Congfeng Cao
Ekaterina Shutova
132
0
0
21 Oct 2025
Unbiased Gradient Low-Rank Projection
Rui Pan
Yang Luo
Yuxing Liu
Yang You
Tong Zhang
132
0
0
20 Oct 2025
ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts
Zheyue Tan
Ruoyao Xiao
Tao Yuan
Dong Zhou
Weilin Liu
...
Haiyang Xu
Boxun Li
Guohao Dai
Bo Zhao
Yu Wang
MoE
156
0
0
20 Oct 2025
From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models
Ziyan Wang
Enmao Diao
Qi Le
Pu Wang
Minwoo Lee
Shu-ping Yeh
Evgeny Stupachenko
Hao Feng
Li Yang
116
1
0
20 Oct 2025
Mapping Post-Training Forgetting in Language Models at Scale
Jackson Harmon
Andreas Hochlehnert
Matthias Bethge
Ameya Prabhu
CLL
KELM
133
0
0
20 Oct 2025
MARS-M: When Variance Reduction Meets Matrices
Yifeng Liu
Angela Yuan
Q. Gu
173
0
0
20 Oct 2025
The Free Transformer
François Fleuret
40
0
0
20 Oct 2025
Learning from Generalization Patterns: An Evaluation-Driven Approach to Enhanced Data Augmentation for Fine-Tuning Small Language Models
Huan Song
Deeksha Razdan
Yiyue Qian
Arijit Ghosh Chowdhury
Parth Patwa
Aman Chadha
Shinan Zhang
Sharlina Keshava
Hannah R Marlowe
70
1
0
20 Oct 2025
Vocab Diet: Reshaping the Vocabulary of LLMs with Vector Arithmetic
Yuval Reif
Guy Kaplan
Roy Schwartz
KELM
157
0
0
19 Oct 2025
DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge
Asmita Mohanty
Gezheng Kang
Lei Gao
M. Annavaram
74
0
0
19 Oct 2025
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Lukas Selch
Yufang Hou
Muhammad Jehanzeb Mirza
Sivan Doveh
James Glass
Rogerio Feris
Wei Lin
162
0
0
18 Oct 2025
What Limits Agentic Systems Efficiency?
S. Bian
Minghao Yan
Anand Jayarajan
Gennady Pekhimenko
Shivaram Venkataraman
LLMAG
LRM
113
0
0
18 Oct 2025
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Fei Wang
Li Shen
Liang Ding
Chao Xue
Ye Liu
Changxing Ding
104
0
0
17 Oct 2025
From Characters to Tokens: Dynamic Grouping with Hierarchical BPE
Rares Dolga
Lucas Maystre
Tudor Berariu
David Barber
76
0
0
17 Oct 2025
Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs
Parsa Hejabi
Elnaz Rahmati
Alireza S. Ziabari
Morteza Dehghani
AAML
LRM
112
0
0
16 Oct 2025
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
Ruijie Miao
Yilun Yao
Zihan Wang
Z. Wang
Bairen Yi
LingJun Liu
Yikai Zhao
Tong Yang
MoMe
128
0
0
16 Oct 2025
Predicting Task Performance with Context-aware Scaling Laws
Kyle Montgomery
David Park
Jianhong Tu
Michael Bendersky
Beliz Gunel
Dawn Song
Chenguang Wang
LRM
84
1
0
16 Oct 2025
Continual Learning via Sparse Memory Finetuning
Jessy Lin
Luke Zettlemoyer
Gargi Ghosh
Wen-tau Yih
Aram H. Markosyan
Vincent-Pierre Berges
Barlas Oğuz
KELM
CLL
132
0
0
16 Oct 2025
RLSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
Zhichao Wang
Andy Wong
Ruslan Belkin
ALM
LRM
99
0
0
16 Oct 2025
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models
Chen Zheng
Y. Cai
Deyi Liu
Jin Ma
Yiyuan Ma
Y. Yang
Jing Liu
Yutao Zeng
Xun Zhou
Siyuan Qiao
MoE
124
0
0
15 Oct 2025
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Daniil Gurgurov
Josef van Genabith
Simon Ostermann
MoE
178
0
0
15 Oct 2025
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems
Jingru Lin
Chen Zhang
Stephen Y. Liu
Haizhou Li
RALM
100
0
0
15 Oct 2025
Closing the Gap Between Text and Speech Understanding in LLMs
Santiago Cuervo
Skyler Seto
Maureen de Seyssel
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
Zakaria Aldeneh
112
1
0
15 Oct 2025
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky
Anastasia Orlova
Illarion Iov
Nina Gubina
Irena Gureeva
Alexey Zaytsev
AAML
76
0
0
15 Oct 2025
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby
Ivan Lazarevich
Nish Sinnadurai
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
96
0
0
15 Oct 2025
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu
Qiang Lu
Meichen Dong
Jake Luo
110
3
0
15 Oct 2025
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Xiaozhe Li
TianYi Lyu
Siyi Yang
Yuxi Gong
Yizhao Yang
Jinxuan Huang
Ligao Zhang
Zhuoyi Huang
Qingwen Liu
ELM
151
0
0
15 Oct 2025
Tahakom LLM Guidelines and Recipes: From Pre-training Data to an Arabic LLM
Areej AlOtaibi
Lina Alyahya
Raghad Alshabanah
Shahad Alfawzan
Shuruq Alarefei
...
Waad Alahmed
Omar Talabay
Jalal Alowibdi
Salem Alelyani
Adel Bibi
169
0
0
15 Oct 2025
OPLoRA: Orthogonal Projection LoRA Prevents Catastrophic Forgetting during Parameter-Efficient Fine-Tuning
Yifeng Xiong
Xiaohui Xie
CLL
464
1
0
14 Oct 2025
CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression
Dayin Gou
Sanghyun Byun
Nilesh Malpeddi
Gabrielle De Micheli
Prathamesh Vaste
Jacob Song
Woo Seong Chung
MQ
84
0
0
14 Oct 2025
Neural Weight Compression for Language Models
Jegwang Ryu
Minkyu Kim
Seungjun Shin
Hee Min Choi
Dokwan Oh
Jaeho Lee
100
0
0
13 Oct 2025
Previous
1
2
3
4
5
6
...
43
44
45
Next