ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07830
  4. Cited By
HellaSwag: Can a Machine Really Finish Your Sentence?

HellaSwag: Can a Machine Really Finish Your Sentence?

Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
ArXiv (abs)PDFHTML

Papers citing "HellaSwag: Can a Machine Really Finish Your Sentence?"

50 / 2,253 papers shown
SAS: Simulated Attention Score
SAS: Simulated Attention Score
Chuanyang Zheng
J. Sun
Yihang Gao
Yuehao Wang
Peihao Wang
...
Atlas Wang
Mac Schwager
Anderson Schneider
Xiaodong Liu
Jianfeng Gao
AI4TS
243
2
0
10 Jul 2025
FlexOlmo: Open Language Models for Flexible Data Use
FlexOlmo: Open Language Models for Flexible Data Use
Weijia Shi
Akshita Bhagia
Kevin Farhat
Niklas Muennighoff
Pete Walsh
...
Luke Zettlemoyer
Pang Wei Koh
Hannaneh Hajishirzi
Ali Farhadi
Sewon Min
MoE
390
4
0
09 Jul 2025
Steering Information Utility in Key-Value Memory for Language Model Post-Training
Steering Information Utility in Key-Value Memory for Language Model Post-Training
Chunyuan Deng
Ruidi Chang
Hanjie Chen
LLMSV
364
0
0
07 Jul 2025
Train-before-Test Harmonizes Language Model Rankings
Train-before-Test Harmonizes Language Model Rankings
Guanhua Zhang
Ricardo Dominguez-Olmedo
Moritz Hardt
ALM
206
2
0
07 Jul 2025
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Jingze Zhu
Y. Wu
Wenbo Zhu
Jiawang Cao
Y. Zheng
Jiawei Chen
Xu Yang
Bernt Schiele
Jonas Fischer
Xinting Hu
OffRL
176
0
0
06 Jul 2025
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
Xujia Wang
Yunjia Qi
Bin Xu
249
0
0
06 Jul 2025
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiuying Wei
Anunay Yadav
Razvan Pascanu
Çağlar Gülçehre
AI4TS
261
0
0
06 Jul 2025
OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference
OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference
Seungjun Shin
Jaehoon Oh
Dokwan Oh
168
1
0
05 Jul 2025
Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training
Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training
Ismail Labiad
Mathurin Videau
Matthieu Kowalski
Marc Schoenauer
Alessandro Leite
Julia Kempe
O. Teytaud
AAML
289
0
0
02 Jul 2025
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models
Samridhi Raj Sinha
Rajvee Sheth
Abhishek Upperwal
Mayank Singh
ELM
187
0
0
02 Jul 2025
Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check
Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check
Nicholas Lourie
Michael Y. Hu
Dong Wang
171
7
0
01 Jul 2025
AutoMixer: Checkpoint Artifacts as Automatic Data Mixers
AutoMixer: Checkpoint Artifacts as Automatic Data MixersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ernie Chang
Yang Li
Patrick Huber
Vish Vogeti
David Kant
Yangyang Shi
Vikas Chandra
147
3
0
27 Jun 2025
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Ruokai Yin
Yuhang Li
Donghyun Lee
Priyadarshini Panda
VLM
244
2
0
25 Jun 2025
Tensor-Parallelism with Partially Synchronized Activations
Tensor-Parallelism with Partially Synchronized Activations
Itay Lamprecht
Asaf Karnieli
Y. Hanani
Niv Giladi
Daniel Soudry
81
1
0
24 Jun 2025
Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment
Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment
Yuhui Sun
Xiyao Wang
Zixi Li
Zhenlong Yuan
Jinman Zhao
207
0
0
24 Jun 2025
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding HelpsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jiashun Cheng
Chenyi Zi
Polydoros Giannouris
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
220
0
0
20 Jun 2025
EvoLM: In Search of Lost Language Model Training Dynamics
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi
Fan Nie
Alexandre Alahi
James Zou
Himabindu Lakkaraju
Yilun Du
Eric P. Xing
Sham Kakade
Hanlin Zhang
312
2
0
19 Jun 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki
Xiuyu Li
Junxian Guo
Ligeng Zhu
Chenfeng Xu
Konstantinos N. Plataniotis
Amir Yazdanbakhsh
Kurt Keutzer
Song Han
Zhijian Liu
217
4
0
19 Jun 2025
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Zhiyuan Liang
Dongwen Tang
Yuhao Zhou
Xuanlei Zhao
Mingjia Shi
...
Damian Borth
Michael M. Bronstein
Yang You
Zinan Lin
Kai Wang
OffRL
240
3
0
19 Jun 2025
Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models
Thunder-Tok: Minimizing Tokens per Word in Tokenizing Korean Texts for Generative Language Models
Gyeongje Cho
Yeonkyoun So
Chanwoo Park
Sangmin Lee
Sungmok Jung
Jaejin Lee
VLM
221
0
0
18 Jun 2025
Finance Language Model Evaluation (FLaME)
Finance Language Model Evaluation (FLaME)
Glenn Matlin
Mika Okamoto
Huzaifa Pardawala
Yang Yang
Sudheer Chava
AIFinLRM
190
1
0
18 Jun 2025
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
Junqi Jiang
Tom Bewley
Salim I. Amoukou
Francesco Leofante
Antonio Rago
Saumitra Mishra
Francesca Toni
190
2
0
18 Jun 2025
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Hyunji Lee
Seunghyun Yoon
Yunjae Won
Hanseok Oh
Geewook Kim
Trung H. Bui
Franck Dernoncourt
Elias Stengel-Eskin
Mohit Bansal
Minjoon Seo
LRM
249
2
0
18 Jun 2025
RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models
RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models
Bailin Wang
Chang Lan
Chong-Jun Wang
Ruoming Pang
257
2
0
18 Jun 2025
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
Shen Yuan
Yin Zheng
Taifeng Wang
Binbin Liu
Hongteng Xu
MoMe
385
1
0
17 Jun 2025
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Yuto Harada
Yusuke Yamauchi
Yusuke Oda
Yohei Oseki
Yusuke Miyao
Yu Takagi
ALM
258
5
0
17 Jun 2025
SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
Gyuhak Kim
Sumiran Thakur
Su Min Park
Wei Wei
Yujia Bao
153
2
0
17 Jun 2025
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
Junho Yoon
Geom Lee
Donghyeon Jeon
Inho Kang
Seung-Hoon Na
MQVLM
227
0
0
16 Jun 2025
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling LawAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qiming Ge
Shuhao Xing
Songyang Gao
Yunhua Zhou
Yicheng Zou
...
Zhi Chen
Hang Yan
Qi Zhang
Q. Guo
Kai Chen
212
0
0
16 Jun 2025
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu
Jianxun Lian
Zheyuan Xiao
Seraphina Zhang
Tianfu Wang
Nicholas Jing Yuan
Xing Xie
Hui Xiong
ELMLRM
218
3
0
16 Jun 2025
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Mingxue Xu
Y. Xu
Danilo Mandic
187
0
0
16 Jun 2025
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
Guanghui Song
Dongping Liao
Yiren Zhao
Kejiang Ye
Cheng-zhong Xu
X. Gao
MoE
182
0
0
16 Jun 2025
Load Balancing Mixture of Experts with Similarity Preserving Routers
Load Balancing Mixture of Experts with Similarity Preserving Routers
Nabil Omi
S. Sen
Ali Farhadi
MoE
276
7
0
16 Jun 2025
Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models
Just Go Parallel: Improving the Multilingual Capabilities of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Muhammad Reza Qorib
Junyi Li
Hwee Tou Ng
LRM
251
4
0
16 Jun 2025
GTA: Grouped-head latenT Attention
GTA: Grouped-head latenT Attention
Luoyang Sun
Cheng Deng
Jiwen Jiang
Xinjian Wu
Haifeng Zhang
Lei Chen
Lionel M. Ni
Ning Yang
173
1
0
15 Jun 2025
Assessing the Role of Data Quality in Training Bilingual Language Models
Assessing the Role of Data Quality in Training Bilingual Language Models
Skyler Seto
Maartje ter Hoeve
Maureen de Seyssel
David Grangier
159
0
0
15 Jun 2025
Improving Large Language Model Safety with Contrastive Representation Learning
Improving Large Language Model Safety with Contrastive Representation Learning
Samuel Simko
Mrinmaya Sachan
Bernhard Schölkopf
Zhijing Jin
AAML
360
2
0
13 Jun 2025
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu
Hamish Ivison
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
258
2
0
13 Jun 2025
Curriculum-Guided Layer Scaling for Language Model Pretraining
Curriculum-Guided Layer Scaling for Language Model Pretraining
Karanpartap Singh
Neil Band
Ehsan Adeli
ALMLRM
233
0
0
13 Jun 2025
LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
Yicheng Xiao
Lin Song
Rui Yang
Cheng Cheng
Yixiao Ge
Xiu Li
Y. Shan
OffRL
197
0
0
13 Jun 2025
Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
Yang Zhang
Amr Mohamed
Hadi Abdine
Guokan Shang
Michalis Vazirgiannis
209
5
0
12 Jun 2025
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Xiaozhe Li
Jixuan Chen
Xinyu Fang
Shengyuan Ding
Haodong Duan
Qingwen Liu
Kai-xiang Chen
LLMAGLRM
289
6
0
12 Jun 2025
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
Diana Abagyan
Alejandro Salamanca
Andres Felipe Cruz-Salinas
Kris Cao
Hangyu Lin
Acyr Locatelli
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
CLL
374
4
0
12 Jun 2025
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training
Mozhi Zhang
Howe Tissue
Lu Wang
Jiaqi Leng
303
2
0
12 Jun 2025
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
Yiran Peng
Jingze Shi
Yifan Wu
Nan Tang
Yuyu Luo
325
6
0
11 Jun 2025
Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform
Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform
Jay Roberts
Kyle Mylonakis
Sidhartha Roy
Kaan Kale
210
0
0
11 Jun 2025
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-ExpertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuchen Feng
Bowen Shen
Naibin Gu
Jiaxuan Zhao
Peng Fu
Zheng Lin
Weiping Wang
MoMeMoE
200
4
0
11 Jun 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
174
1
0
10 Jun 2025
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Jaewoo Song
Harshvardhan Sikka
235
0
0
10 Jun 2025
Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs
Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs
Jung Hyun Lee
Seungjae Shin
Vinnam Kim
Jaeseong You
An Chen
MQ
197
2
0
10 Jun 2025
Previous
123...91011...444546
Next