Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.07830
Cited By
HellaSwag: Can a Machine Really Finish Your Sentence?
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HellaSwag: Can a Machine Really Finish Your Sentence?"
50 / 2,250 papers shown
Title
FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation
Fatema Siddika
Md Anwar Hossen
J. P. Muñoz
Tanya Roosta
Anuj Sharma
Ali Jannesari
FedML
136
1
0
24 Dec 2025
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
Anand Gopalakrishnan
Róbert Csordás
Jürgen Schmidhuber
M. C. Mozer
263
1
0
24 Dec 2025
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
192
0
0
24 Dec 2025
Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems
Zehao Fan
Zhenyu Liu
Y. Liu
Yayue Hou
Hadjer Benmeziane
Kaoutar El Maghraoui
Liu Liu
MoE
100
0
0
04 Dec 2025
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning
Pritam Kadasi
Abhishek Upperwal
Mayank Singh
VLM
96
0
0
04 Dec 2025
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs
Wenhua Cheng
Weiwei Zhang
Heng Guo
Haihao Shen
MQ
70
0
0
04 Dec 2025
Jina-VLM: Small Multilingual Vision Language Model
Andreas Koukounas
Georgios Mastrapas
Florian Hönicke
Sedigheh Eslami
Guillaume Roncari
Scott Martens
Han Xiao
MLLM
307
0
0
03 Dec 2025
Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules
Amr Mohamed
Yang Zhang
Michalis Vazirgiannis
Guokan Shang
AI4CE
150
0
0
02 Dec 2025
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
Róbert Belanec
Ivan Srba
Maria Bielikova
ALM
400
0
0
02 Dec 2025
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
220
0
0
28 Nov 2025
PerfMamba: Performance Analysis and Pruning of Selective State Space Models
Abdullah Al Asif
Mobina Kashaniyan
Sixing Yu
J. P. Muñoz
Ali Jannesari
Mamba
286
0
0
28 Nov 2025
Ghosting Your LLM: Without The Knowledge of Your Gradient and Data
Abeer Matar A. Almalky
Ziyan Wang
Mohaiminul Al Nahian
Li Yang
Adnan Siraj Rakin
AAML
188
0
0
27 Nov 2025
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
Jinying Xiao
Bin Ji
Shasha Li
Xiaodong Liu
Ma Jun
Ye Zhong
Wei Li
Xuan Xie
Qingbo Wu
Jie Yu
MQ
107
0
0
27 Nov 2025
PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Róbert Belanec
Branislav Pecher
Ivan Srba
Maria Bielikova
111
1
0
26 Nov 2025
IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference
Wanli Zhong
Haibo Feng
Zirui Zhou
Hanyang Peng
Shiqi Yu
MQ
306
0
0
26 Nov 2025
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Dongyang Fan
Diba Hashemi
Sai Praneeth Karimireddy
Martin Jaggi
121
0
0
26 Nov 2025
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
Zhenyi Shen
Junru Lu
Lin Gui
Jiazheng Li
Yulan He
D. Yin
Xing Sun
274
0
0
25 Nov 2025
BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
Abdullah Al Sefat
152
1
0
25 Nov 2025
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Wentao Hu
Mingkuan Zhao
Shuangyong Song
Xiaoyan Zhu
Xin Lai
Jiayin Wang
115
2
0
25 Nov 2025
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Junbo Zhang
Ran Chen
Qianli Zhou
Xinyang Deng
Wen Jiang
169
1
0
24 Nov 2025
ModHiFi: Identifying High Fidelity predictive components for Model Modification
Dhruva Kashyap
Chaitanya Murti
Pranav K Nayak
Tanay Narshana
Chiranjib Bhattacharyya
116
0
0
24 Nov 2025
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
Xin Yuan
S. Li
Jiateng Wei
Chengrui Zhu
Yanming Wu
Qingpeng Li
Jiajun Lv
Xiaoke Lan
Jun Chen
Yong-Jin Liu
OffRL
368
0
0
24 Nov 2025
MoodBench 1.0: An Evaluation Benchmark for Emotional Companionship Dialogue Systems
Haifeng Jing
Yujie Hou
Junfei Liu
Rui Xie
alan Xu
Jinlong Ma
Qichun Deng
209
0
0
24 Nov 2025
CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
Ziteng Sun
Adrian Benton
Samuel Kushnir
Asher Trockman
Vikas Singh
Suhas Diggavi
A. Suresh
MQ
154
0
0
24 Nov 2025
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu
Xiaolong Zhong
Ling Jiang
LLMAG
MU
MoE
LRM
376
0
0
23 Nov 2025
Blu-WERP (Web Extraction and Refinement Pipeline): A Scalable Pipeline for Preprocessing Large Language Model Datasets
Gowtham
Sai Rupesh
Sanjay Kumar
Saravanan
Venkata Chaithanya
VLM
193
0
0
22 Nov 2025
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
G. Carneiro
Thanh-Toan Do
MQ
134
0
0
21 Nov 2025
E
3
^3
3
-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
Tao Yuan
Haoli Bai
Yinfei Pan
Xuyang Cao
Tianyu Zhang
Lu Hou
Ting Hu
Xianzhi Yu
VLM
203
0
0
21 Nov 2025
Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
G. Carneiro
Jianfei Cai
Thanh-Toan Do
MQ
134
0
0
21 Nov 2025
R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization
Jiayi Chen
Jieqi Shi
Jing Huo
Chen Wu
MQ
155
0
0
21 Nov 2025
AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser
Ren Ma
Jiantao Qiu
Chao Xu
Pei Chu
Kaiwen Liu
...
Wentao Zhang
Zhongying Tu
Wentao Zhang
Dahua Lin
Conghui He
120
0
0
20 Nov 2025
Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone
Vaibhav Singh
Oleksiy Ostapenko
Pierre-Andre Noel
Torsten Scholak
Mamba
AI4CE
408
0
0
19 Nov 2025
Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence
Akbar Anbar Jafari
C. Ozcinar
G. Anbarjafari
AI4CE
80
1
0
18 Nov 2025
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
Shalini Maiti
Amar Budhiraja
Bhavul Gauri
Gaurav Chaurasia
Anton Protopopov
...
Michael Slater
Despoina Magka
Tatiana Shavrina
Roberta Raileanu
Yoram Bachrach
MoMe
150
0
0
17 Nov 2025
Learning from the Undesirable: Robust Adaptation of Language Models without Forgetting
Yunhun Nam
Jaehyung Kim
Jongheon Jeong
108
0
0
17 Nov 2025
Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning
Kajetan Dymkiewicz
Ivan Vulić
Helen Yannakoudakis
Eilam Shapira
Roi Reichart
Anna Korhonen
100
0
0
17 Nov 2025
OTARo: Once Tuning for All Precisions toward Robust On-Device LLMs
Shaoyuan Chen
Zhixuan Chen
Dawei Yang
Zhihang Yuan
Qiang Wu
MQ
164
0
0
17 Nov 2025
CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product
Kaiwen Xue
Chenglong Li
Zhonghong Ou
Guoxin Zhang
Kaoyan Lu
...
Xinyu Liu
Qunlin Chen
Weiwei Qin
Yiran Shen
Jiayi Cen
112
0
0
17 Nov 2025
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
Jiacheng Wang
Yejun Zeng
Jinyang Guo
Yuqing Ma
Aishan Liu
Xianglong Liu
MQ
277
1
0
17 Nov 2025
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks
Mingyu Sung
Suhwan Im
Vikas Palakonda
Jae-Mo Kang
80
0
0
11 Nov 2025
Sentence-Anchored Gist Compression for Long-Context LLMs
Dmitrii Tarasov
Elizaveta Goncharova
Kuznetsov Andrey
112
0
0
11 Nov 2025
SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
Zhixiong Zhao
Fangxin Liu
Junjie Wang
Chenyang Guan
Z. Wang
Li Jiang
Haibing Guan
MQ
101
0
0
11 Nov 2025
Importance-Aware Data Selection for Efficient LLM Instruction Tuning
Tingyu Jiang
Shen Li
Yiyao Song
Lan Zhang
Hualei Zhu
Yuan Zhao
Xiaohang Xu
Kenjiro Taura
Hao Henry Wang
300
1
0
10 Nov 2025
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Zhongyang Li
Ziyue Li
Tianyi Zhou
MoE
MoMe
615
0
0
10 Nov 2025
Learning to Focus: Focal Attention for Selective and Scalable Transformers
Dhananjay Ram
Wei Xia
Stefano Soatto
280
0
0
10 Nov 2025
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Sean McLeish
Ang Li
John Kirchenbauer
Dayal Singh Kalra
Brian Bartoldson
B. Kailkhura
Avi Schwarzschild
Jonas Geiping
Tom Goldstein
Micah Goldblum
260
0
0
10 Nov 2025
MobileLLM-Pro Technical Report
Patrick Huber
Ernie Chang
Wei Wen
Igor Fedorov
Tarek Elgamal
...
Vikas Chandra
Ahmed Aly
Anuj Kumar
Raghuraman Krishnamoorthi
Adithya Sagar
116
0
0
10 Nov 2025
Rethinking Parameter Sharing as Graph Coloring for Structured Compression
Boyang Zhang
Daning Cheng
Yunquan Zhang
168
0
0
10 Nov 2025
EASE: Practical and Efficient Safety Alignment for Small Language Models
Haonan Shi
Guoli Wang
Tu Ouyang
An Wang
LRM
185
0
0
09 Nov 2025
Better Datasets Start From RefineLab: Automatic Optimization for High-Quality Dataset Refinement
Xiaonan Luo
Yue Huang
Ping He
Xiangliang Zhang
84
0
0
09 Nov 2025
1
2
3
4
...
43
44
45
Next