HellaSwag: Can a Machine Really Finish Your Sentence?

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

19 May 2019

Yejin Choi

Papers citing "HellaSwag: Can a Machine Really Finish Your Sentence?"

50 / 2,253 papers shown

SAS: Simulated Attention Score

...

243

10 Jul 2025

FlexOlmo: Open Language Models for Flexible Data Use

...

390

09 Jul 2025

Steering Information Utility in Key-Value Memory for Language Model Post-Training

364

07 Jul 2025

Train-before-Test Harmonizes Language Model Rankings

Guanhua Zhang

Ricardo Dominguez-Olmedo

Moritz Hardt

ALM

206

07 Jul 2025

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

176

06 Jul 2025

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Xujia Wang

Yunjia Qi

Bin Xu

249

06 Jul 2025

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

261

06 Jul 2025

OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference

Seungjun Shin

Jaehoon Oh

Dokwan Oh

168

05 Jul 2025

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

289

02 Jul 2025

Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models

187

02 Jul 2025

Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check

Nicholas Lourie

Michael Y. Hu

Dong Wang

171

01 Jul 2025

AutoMixer: Checkpoint Artifacts as Automatic Data MixersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

147

27 Jun 2025

DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs

244

25 Jun 2025

Tensor-Parallelism with Partially Synchronized Activations

24 Jun 2025

Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment

207

24 Jun 2025

Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding HelpsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

220

20 Jun 2025

EvoLM: In Search of Lost Language Model Training Dynamics

312

19 Jun 2025

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

Konstantinos N. Plataniotis

217

19 Jun 2025

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

...

240

19 Jun 2025

Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models

221

18 Jun 2025

Finance Language Model Evaluation (FLaME)

190

18 Jun 2025

Representation Consistency for Accurate and Coherent LLM Answer Aggregation

190

18 Jun 2025

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact

249

18 Jun 2025

RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models

257

18 Jun 2025

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

385

17 Jun 2025

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality

258

17 Jun 2025

SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models

153

17 Jun 2025

ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models

227

16 Jun 2025

Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling LawAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

212

16 Jun 2025

Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study

218

16 Jun 2025

TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices

Mingxue Xu

Y. Xu

Danilo Mandic

187

16 Jun 2025

Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization

182

16 Jun 2025

Load Balancing Mixture of Experts with Similarity Preserving Routers

276

16 Jun 2025

Just Go Parallel: Improving the Multilingual Capabilities of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

251

16 Jun 2025

GTA: Grouped-head latenT Attention

173

15 Jun 2025

Assessing the Role of Data Quality in Training Bilingual Language Models

159

15 Jun 2025

Improving Large Language Model Safety with Contrastive Representation Learning

360

13 Jun 2025

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index

258

13 Jun 2025

Curriculum-Guided Layer Scaling for Language Model Pretraining

233

13 Jun 2025

LoRA-Gen: Specializing Large Language Model via Online LoRA Generation

197

13 Jun 2025

Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning

Michalis Vazirgiannis

209

12 Jun 2025

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

289

12 Jun 2025

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

Diana Abagyan

Alejandro Salamanca

Andres Felipe Cruz-Salinas

374

12 Jun 2025

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training

303

12 Jun 2025

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

325

11 Jun 2025

Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform

210

11 Jun 2025

DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-ExpertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

200

11 Jun 2025

Olica: Efficient Structured Pruning of Large Language Models without Retraining

Jiujun He

Huazhen Lin

174

10 Jun 2025

An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models

235

10 Jun 2025

Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs

197

10 Jun 2025