v1v2 (latest)

Shortformer: Better Language Modeling using Shorter Inputs

Annual Meeting of the Association for Computational Linguistics (ACL), 2021

31 December 2020

Ofir Press

Noah A. Smith

M. Lewis

ArXiv (abs)PDF HTML

Papers citing "Shortformer: Better Language Modeling using Shorter Inputs"

50 / 71 papers shown

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

403

25 Nov 2025

Length-MAX Tokenizer for Language Models

Dong Dong

Weijie Su

VLM

242

25 Nov 2025

Progressive Growing of Patch Size: Curriculum Learning for Accelerated and Improved Medical Image Segmentation

261

27 Oct 2025

From Global to Local: A Scalable Benchmark for Local Posterior Sampling

Rohan Hitchcock

Jesse Hoogland

201

29 Jul 2025

PIPE: Physics-Informed Position Encoding for Alignment of Satellite Images and Time Series

235

27 May 2025

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

...

727

198

10 Apr 2025

Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder

Changye Li

Weizhe Xu

Serguei V. S. Pakhomov

Ellen Bradley

Dror Ben-Zeev

T. Cohen

330

25 Mar 2025

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequencesInternational Conference on Learning Representations (ICLR), 2024

Johannes Brandstetter

418

06 Nov 2024

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient LearningNeural Information Processing Systems (NeurIPS), 2024

...

404

05 Nov 2024

Fisher Information-based Efficient Curriculum Federated Learning with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Ji Liu

Jiaxiang Ren

Ruoming Jin

Zijie Zhang

Yang Zhou

P. Valduriez

Dejing Dou

FedML

338

30 Sep 2024

dnaGrinder: a lightweight and high-capacity genomic foundation model

Qihang Zhao

Chi Zhang

Weixiong Zhang

288

24 Sep 2024

Curriculum Learning for Small Code Language Models

205

14 Jul 2024

LETS-C: Leveraging Text Embedding for Time Series Classification

322

09 Jul 2024

Lessons from the Trenches on Reproducible Evaluation of Language Models

...

481

130

23 May 2024

From Transformers to LLMs: A Systematic Survey of Efficiency Considerations in NLP

562

15 May 2024

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Kevin Slagle

263

22 Apr 2024

Compression Represents Intelligence Linearly

389

15 Apr 2024

Progress and Opportunities of Foundation Models in Bioinformatics

334

06 Feb 2024

MambaByte: Token-free Selective State Space Model

437

24 Jan 2024

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Saurav Pawar

S.M. Towhidul Islam Tonmoy

S. M. M. Zaman

Vinija Jain

Vasu Sharma

Amitava Das

265

15 Jan 2024

Paloma: A Benchmark for Evaluating Language Model Fit

Akshita Bhagia

Luca Soldaini

...

405

16 Dec 2023

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk Groeneveld

Anas Awadalla

Iz Beltagy

Akshita Bhagia

Ian H. Magnusson

Hao Peng

Oyvind Tafjord

Pete Walsh

Kyle Richardson

Jesse Dodge

291

15 Dec 2023

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Tianyi Chen

519

01 Dec 2023

Advancing State of the Art in Language Modeling

David Herel

Tomas Mikolov

309

28 Nov 2023

Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures

Julius Steuer

Marius Mosbach

Dietrich Klakow

188

08 Nov 2023

Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways

Venkata S Govindarajan

Juan Diego Rodriguez

Kaj Bostrom

Kyle Mahowald

394

26 Oct 2023

How Much Context Does My Attention-Based ASR System Need?Interspeech (Interspeech), 2023

Robert Flynn

Anton Ragni

331

24 Oct 2023

Manifold-Preserving Transformers are Effective for Short-Long Range EncodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ayan Sengupta

Md. Shad Akhtar

Tanmoy Chakraborty

242

22 Oct 2023

The Locality and Symmetry of Positional Encodings

Lihu Chen

Gaël Varoquaux

Fabian M. Suchanek

235

19 Oct 2023

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?Journal of Social Computing (JSC), 2023

Ari Holtzman

Peter West

Luke Zettlemoyer

AI4CE

305

31 Jul 2023

Lost in the Middle: How Language Models Use Long ContextsTransactions of the Association for Computational Linguistics (TACL), 2023

729

3,319

06 Jul 2023

Leveraging Cross-Utterance Context For ASR DecodingInterspeech (Interspeech), 2023

Robert Flynn

Anton Ragni

249

29 Jun 2023

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide ResolutionNeural Information Processing Systems (NeurIPS), 2023

...

427

458

27 Jun 2023

Long-range Language Modeling with Self-retrievalTransactions of the Association for Computational Linguistics (TACL), 2023

Ohad Rubin

Jonathan Berant

RALM KELM

288

23 Jun 2023

Anticipatory Music Transformer

John Thickstun

David Leo Wright Hall

Chris Donahue

Abigail Z. Jacobs

324

14 Jun 2023

The Information Pathways Hypothesis: Transformers are Dynamic Self-EnsemblesKnowledge Discovery and Data Mining (KDD), 2023

Md Shamim Hussain

Mohammed J Zaki

D. Subramanian

459

02 Jun 2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale TransformersNeural Information Processing Systems (NeurIPS), 2023

Luke Zettlemoyer

377

160

12 May 2023

Localizing Model Behavior with Path Patching

Nicholas W. Goldowsky-Dill

Chris MacLeod

L. Sato

Aryaman Arora

684

143

12 Apr 2023

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Li Shen

Liang Ding

376

07 Apr 2023

Stabilizing Transformer Training by Preventing Attention Entropy CollapseInternational Conference on Machine Learning (ICML), 2023

444

147

11 Mar 2023

Black-box language model explanation by context length probingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Ondřej Cífka

Antoine Liutkus

MILM LRM

370

30 Dec 2022

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and RoutingAAAI Conference on Artificial Intelligence (AAAI), 2022

Yuxiong He

447

07 Dec 2022

The Curious Case of Absolute Position EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Koustuv Sinha

Amirhossein Kazemnejad

Siva Reddy

J. Pineau

Dieuwke Hupkes

Adina Williams

293

23 Oct 2022

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

Tao Tang

Changlin Li

Guangrun Wang

Kaicheng Yu

Xiaojun Chang

Xiaodan Liang

ViT

247

16 Oct 2022

Efficient Methods for Natural Language Processing: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2022

Marcos Vinícius Treviso

...

Niranjan Balasubramanian

Leon Derczynski

Iryna Gurevych

Roy Schwartz

500

151

31 Aug 2022

The Importance of Context in Very Low Resource Language ModelingICON (ICON), 2022

Lukas Edman

Antonio Toral

Gertjan van Noord

205

10 May 2022

ChapterBreak: A Challenge Dataset for Long-Range Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Simeng Sun

Katherine Thai

Mohit Iyyer

212

22 Apr 2022

DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks

Changjie Fan

215

19 Apr 2022

Linearizing Transformer with Key-Value MemoryConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yizhe Zhang

Deng Cai

376

23 Mar 2022

Better Language Model with Hypernym Class PredictionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

273

21 Mar 2022