Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1905.07830
Cited By
HellaSwag: Can a Machine Really Finish Your Sentence?
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HellaSwag: Can a Machine Really Finish Your Sentence?"
50 / 2,226 papers shown
Title
Sentry: Authenticating Machine Learning Artifacts on the Fly
Andrew Gan
Zahra Ghodsi
61
1
0
01 Oct 2025
Composer: A Search Framework for Hybrid Neural Architecture Design
Bilge Acun
Prasoon Sinha
Newsha Ardalani
Sangmin Bae
Alicia Golden
Chien-Yu Lin
Meghana Madhyastha
Fei Sun
N. Yadwadkar
Carole-Jean Wu
180
1
0
01 Oct 2025
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Shojiro Yamabe
Jun Sakuma
AAML
88
0
0
01 Oct 2025
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
Mingjin Li
Yu Liu
Huayi Liu
Xiang Ye
Chao Jiang
Hongguang Zhang
Yu Ruan
168
0
0
30 Sep 2025
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Yaoxiang Wang
Qingguo Hu
Yucheng Ding
Ruizhe Wang
Yeyun Gong
Jian Jiao
Yelong Shen
Peng Cheng
Jinsong Su
MoE
56
0
0
30 Sep 2025
Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
Houjun Liu
Shikhar Murty
Christopher D. Manning
Róbert Csordás
ReLM
LRM
AI4CE
78
0
0
30 Sep 2025
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
Chuanyang Zheng
Jiankai Sun
Yihang Gao
Enze Xie
Yuehao Wang
...
Kashif Rasul
Mac Schwager
Anderson Schneider
Zinan Lin
Yuriy Nevmyvaka
MoE
154
2
0
30 Sep 2025
Towards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners
Charlotte Li
Nick Hagar
Sachita Nishal
Jeremy Gilbert
Nick Diakopoulos
77
0
0
30 Sep 2025
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
Yuan Zhuang
Yi Shen
Yuexin Bian
Qing Su
Shihao Ji
Yuanyuan Shi
Fei Miao
MoE
MoMe
164
1
0
30 Sep 2025
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Junlin Han
Shengbang Tong
David Fan
Yufan Ren
Koustuv Sinha
Juil Sock
Filippos Kokkinos
LRM
VLM
139
4
0
30 Sep 2025
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
Kaizhuo Yan
Yingjie Yu
Yifan Yu
Haizhong Zheng
Fan Lai
VLM
68
0
0
30 Sep 2025
Collaborative Compression for Large-Scale MoE Deployment on Edge
Yixiao Chen
Yanyue Xie
Ruining Yang
Wei Jiang
Wei Wang
Yong He
Yue Chen
Pu Zhao
Y. Wang
MQ
56
0
0
30 Sep 2025
CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
Weiyu Huang
Yuezhou Hu
Jun Zhu
Jianfei Chen
CLL
72
0
0
30 Sep 2025
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
116
0
0
30 Sep 2025
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks
Arda Uzunoglu
Tianjian Li
Daniel Khashabi
116
0
0
30 Sep 2025
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Huu Nguyen
Victor May
Harsh Raj
Marianna Nezhurina
Yishan Wang
...
Aleksandra Krasnodębska
Christoph Schuhmann
Mats Leon Richter
Xuan-Son
J. Jitsev
127
1
0
29 Sep 2025
Pretraining with hierarchical memories: separating long-tail and common knowledge
Hadi Pouransari
David Grangier
C Thomas
Michael Kirchhof
Oncel Tuzel
RALM
KELM
187
1
0
29 Sep 2025
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
Hao Ban
Kaiyi Ji
MoE
137
0
0
29 Sep 2025
LLM DNA: Tracing Model Evolution via Functional Representations
Zhaomin Wu
Haodong Zhao
Ziyang Wang
Jizhou Guo
Qian Wang
Bingsheng He
76
1
0
29 Sep 2025
Conda: Column-Normalized Adam for Training Large Language Models Faster
Junjie Wang
Pan Zhou
Yiming Dong
Huan Li
Jia Li
Xun Zhou
Qicheng Lao
Cong Fang
Zhouchen Lin
AI4CE
184
0
0
29 Sep 2025
Fingerprinting LLMs via Prompt Injection
Yuepeng Hu
Zhengyuan Jiang
Mengyuan Li
Osama Ahmed
Zhicong Huang
Cheng Hong
Neil Zhenqiang Gong
154
0
0
29 Sep 2025
AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment
Mengyu Bu
Shaolei Zhang
Zhongjun He
Hua Wu
Yang Feng
100
0
0
29 Sep 2025
Beyond Repetition: Text Simplification and Curriculum Learning for Data-Constrained Pretraining
M. R
Dan John Velasco
69
0
0
29 Sep 2025
Short window attention enables long-term memorization
Loic Cabannes
Maximilian Beck
Gergely Szilvasy
Matthijs Douze
Maria Lomeli
Jade Copet
Pierre-Emmanuel Mazaré
Gabriel Synnaeve
Hervé Jégou
104
1
0
29 Sep 2025
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
Jae-Bum Seo
Muhammad Salman
Lismer Andres Caceres-Najarro
68
0
0
29 Sep 2025
Timber: Training-free Instruct Model Refining with Base via Effective Rank
Taiqiang Wu
Runming Yang
Tao Liu
Jiahao Wang
Zenan Xu
Ngai Wong
68
1
0
28 Sep 2025
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Lucio La Cava
Andrea Tagarelli
LLMSV
132
0
0
28 Sep 2025
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
Haojie Ouyang
Jianwei Lv
Lei Ren
Chen Wei
Xiaojie Wang
Fangxiang Feng
VLM
120
0
0
28 Sep 2025
Assessing Large Language Models in Updating Their Forecasts with New Information
Zhangdie Yuan
Zifeng Ding
Andreas Vlachos
48
0
0
28 Sep 2025
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Zemin Huang
Yuhang Wang
Zhiyang Chen
Guo-Jun Qi
44
3
0
28 Sep 2025
Tequila: Trapping-free Ternary Quantization for Large Language Models
Hong Huang
Decheng Wu
Rui Cen
Guanghua Yu
Z. Li
Kai Liu
Jianchen Zhu
Peng Chen
Xue Liu
Dapeng Wu
MQ
153
2
0
28 Sep 2025
Sequential Diffusion Language Models
Yangzhou Liu
Yue Cao
Hao-Wen Li
Gen Luo
Z. Chen
...
Yuqiang Li
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
72
3
0
28 Sep 2025
PT
2
^2
2
-LLM: Post-Training Ternarization for Large Language Models
Xianglong Yan
Chengzhu Bao
Zhiteng Li
Tianao Zhang
Kaicheng Yang
Haotong Qin
Ruobing Xie
Xingwu Sun
Yulun Zhang
MQ
134
0
0
27 Sep 2025
SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size
Junhao Xia
Ming Zhao
Limin Xiao
Xiujun Zhang
MQ
72
0
0
27 Sep 2025
Multiplayer Nash Preference Optimization
Fang Wu
X. Y. Huang
Weihao Xuan
Zhiwei Zhang
Yijia Xiao
...
Xiaomin Li
Bing Hu
Peng Xia
Jure Leskovec
Yejin Choi
104
1
0
27 Sep 2025
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
Vage Egiazarian
Roberto L. Castro
Denis Kuznedelev
Andrei Panferov
Eldar Kurtic
...
Alexandre Marques
Mark Kurtz
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
184
1
0
27 Sep 2025
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
124
0
0
27 Sep 2025
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Wonje Jeung
Sangyeon Yoon
Yoonjun Cho
Dongjae Jeon
Sangwoo Shin
Hyesoo Hong
Albert No
DiffM
105
0
0
27 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
77
0
0
27 Sep 2025
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
Lauren Hannah
Soheil Zibakhsh
K. Nishu
Arnav Kundu
Mohammad Samragh Razlighi
Mehrdad Farajtabar
Minsik Cho
MoE
68
0
0
27 Sep 2025
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Han Yan
Zheyuan Liu
Meng Jiang
MU
AAML
96
0
0
27 Sep 2025
Beyond Outliers: A Study of Optimizers Under Quantization
Georgios Vlassis
Saleh Ashkboos
Alexandra Volkova
Torsten Hoefler
Dan Alistarh
MQ
140
0
0
27 Sep 2025
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Sebastian Bordt
Martin Pawelczyk
CLL
148
1
0
27 Sep 2025
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Syeda Nahida Akter
Shrimai Prabhumoye
Eric Nyberg
M. Patwary
Mohammad Shoeybi
Yejin Choi
Bryan Catanzaro
AIFin
LRM
AI4CE
92
3
0
26 Sep 2025
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Ke Li
Zheng Yang
Zhongbin Zhou
Feng Xue
Zhonglin Jiang
Wenxiao Wang
MoE
77
0
0
26 Sep 2025
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Naibin Gu
Zhenyu Zhang
Yuchen Feng
Yilong Chen
Peng Fu
...
Shuohuan Wang
Yu Sun
Hua Wu
Weiping Wang
Haifeng Wang
MoE
73
0
0
26 Sep 2025
IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
Xinyu Liu
Bei Li
Jiahao Liu
Junhao Ruan
Kechen Jiao
Hongyin Tang
Jingang Wang
Xiao Tong
Jingbo Zhu
110
0
0
26 Sep 2025
What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples?
Mohammed Sabry
Anya Belz
67
0
0
26 Sep 2025
Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling
Ye Qiao
Haocheng Xu
Xiaofan Zhang
Sitao Huang
MQ
76
0
0
26 Sep 2025
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
Nakyeong Yang
Dong-Kyum Kim
Jea Kwon
Minsung Kim
Kyomin Jung
M. Cha
MU
KELM
88
0
0
26 Sep 2025
Previous
1
2
3
4
5
6
...
43
44
45
Next