Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.15832
Cited By
v1
v2 (latest)
Shortformer: Better Language Modeling using Shorter Inputs
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
31 December 2020
Ofir Press
Noah A. Smith
M. Lewis
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Shortformer: Better Language Modeling using Shorter Inputs"
50 / 71 papers shown
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space
Zhenyi Shen
Junru Lu
Lin Gui
Jiazheng Li
Yulan He
D. Yin
Xing Sun
403
1
0
25 Nov 2025
Length-MAX Tokenizer for Language Models
Dong Dong
Weijie Su
VLM
242
0
0
25 Nov 2025
Progressive Growing of Patch Size: Curriculum Learning for Accelerated and Improved Medical Image Segmentation
Stefan M. Fischer
Johannes Kiechle
Laura Daza
Lina Felsner
Richard Osuala
Daniel M. Lang
Karim Lekadir
J. Peeken
Julia A. Schnabel
MedIm
261
0
0
27 Oct 2025
From Global to Local: A Scalable Benchmark for Local Posterior Sampling
Rohan Hitchcock
Jesse Hoogland
201
2
0
29 Jul 2025
PIPE: Physics-Informed Position Encoding for Alignment of Satellite Images and Time Series
Haobo Li
Eunseo Jung
Zixin Chen
Zhaowei Wang
Yueya Wang
Huamin Qu
Alexis Kai Hon Lau
235
1
0
27 May 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Robert Bamler
727
198
0
10 Apr 2025
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder
Changye Li
Weizhe Xu
Serguei V. S. Pakhomov
Ellen Bradley
Dror Ben-Zeev
T. Cohen
330
0
0
25 Mar 2025
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
International Conference on Learning Representations (ICLR), 2024
Niklas Schmidinger
Lisa Schneckenreiter
Philipp Seidl
Johannes Schimunek
Pieter-Jan Hoedt
Johannes Brandstetter
Andreas Mayr
Sohvi Luukkonen
Sepp Hochreiter
Günter Klambauer
MedIm
418
19
0
06 Nov 2024
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning
Neural Information Processing Systems (NeurIPS), 2024
Yangqiu Song
Tong Zheng
Ran Wang
Jiahao Liu
Qingyan Guo
...
Xu Tan
Tong Xiao
Jingbo Zhu
Jiadong Wang
Xunliang Cai
404
5
0
05 Nov 2024
Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ji Liu
Jiaxiang Ren
Ruoming Jin
Zijie Zhang
Yang Zhou
P. Valduriez
Dejing Dou
FedML
338
10
0
30 Sep 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
288
3
0
24 Sep 2024
Curriculum Learning for Small Code Language Models
Marwa Nair
K. Yamani
Lynda Said Lhadj
Riyadh Baghdadi
205
23
0
14 Jul 2024
LETS-C: Leveraging Text Embedding for Time Series Classification
Rachneet Kaur
Zhen Zeng
T. Balch
Manuela Veloso
AI4TS
322
0
0
09 Jul 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
481
130
3
23 May 2024
From Transformers to LLMs: A Systematic Survey of Efficiency Considerations in NLP
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
562
12
0
15 May 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
263
25
0
22 Apr 2024
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
389
48
0
15 Apr 2024
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu Li
AI4CE
334
51
0
06 Feb 2024
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
437
63
0
24 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Vasu Sharma
Amitava Das
265
47
0
15 Jan 2024
Paloma: A Benchmark for Evaluating Language Model Fit
Ian H. Magnusson
Akshita Bhagia
Valentin Hofmann
Luca Soldaini
A. Jha
...
Iz Beltagy
Hanna Hajishirzi
Noah A. Smith
Kyle Richardson
Jesse Dodge
405
53
0
16 Dec 2023
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Dirk Groeneveld
Anas Awadalla
Iz Beltagy
Akshita Bhagia
Ian H. Magnusson
Hao Peng
Oyvind Tafjord
Pete Walsh
Kyle Richardson
Jesse Dodge
291
2
0
15 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
519
37
0
01 Dec 2023
Advancing State of the Art in Language Modeling
David Herel
Tomas Mikolov
309
1
0
28 Nov 2023
Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures
Julius Steuer
Marius Mosbach
Dietrich Klakow
188
15
0
08 Nov 2023
Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
Venkata S Govindarajan
Juan Diego Rodriguez
Kaj Bostrom
Kyle Mahowald
394
1
0
26 Oct 2023
How Much Context Does My Attention-Based ASR System Need?
Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
331
5
0
24 Oct 2023
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ayan Sengupta
Md. Shad Akhtar
Tanmoy Chakraborty
242
0
0
22 Oct 2023
The Locality and Symmetry of Positional Encodings
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
235
1
0
19 Oct 2023
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Journal of Social Computing (JSC), 2023
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
305
20
0
31 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Transactions of the Association for Computational Linguistics (TACL), 2023
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Abigail Z. Jacobs
RALM
729
3,319
0
06 Jul 2023
Leveraging Cross-Utterance Context For ASR Decoding
Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
249
1
0
29 Jun 2023
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Neural Information Processing Systems (NeurIPS), 2023
Eric N. D. Nguyen
Michael Poli
Marjan Faizi
A. Thomas
Callum Birch-Sykes
...
Stefano Massaroli
Yoshua Bengio
Stefano Ermon
S. Baccus
Christopher Ré
MedIm
427
458
0
27 Jun 2023
Long-range Language Modeling with Self-retrieval
Transactions of the Association for Computational Linguistics (TACL), 2023
Ohad Rubin
Jonathan Berant
RALM
KELM
288
31
0
23 Jun 2023
Anticipatory Music Transformer
John Thickstun
David Leo Wright Hall
Chris Donahue
Abigail Z. Jacobs
324
36
0
14 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Knowledge Discovery and Data Mining (KDD), 2023
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
459
4
0
02 Jun 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Neural Information Processing Systems (NeurIPS), 2023
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
377
160
0
12 May 2023
Localizing Model Behavior with Path Patching
Nicholas W. Goldowsky-Dill
Chris MacLeod
L. Sato
Aryaman Arora
684
143
0
12 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
376
54
0
07 Apr 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
International Conference on Machine Learning (ICML), 2023
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
444
147
0
11 Mar 2023
Black-box language model explanation by context length probing
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Ondřej Cífka
Antoine Liutkus
MILM
LRM
370
10
0
30 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
AAAI Conference on Artificial Intelligence (AAAI), 2022
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
447
42
0
07 Dec 2022
The Curious Case of Absolute Position Embeddings
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Koustuv Sinha
Amirhossein Kazemnejad
Siva Reddy
J. Pineau
Dieuwke Hupkes
Adina Williams
293
21
0
23 Oct 2022
Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers
Tao Tang
Changlin Li
Guangrun Wang
Kaicheng Yu
Xiaojun Chang
Xiaodan Liang
ViT
247
1
0
16 Oct 2022
Efficient Methods for Natural Language Processing: A Survey
Transactions of the Association for Computational Linguistics (TACL), 2022
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
500
151
0
31 Aug 2022
The Importance of Context in Very Low Resource Language Modeling
ICON (ICON), 2022
Lukas Edman
Antonio Toral
Gertjan van Noord
205
2
0
10 May 2022
ChapterBreak: A Challenge Dataset for Long-Range Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Simeng Sun
Katherine Thai
Mohit Iyyer
212
20
0
22 Apr 2022
DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks
Ziyang Luo
Yadong Xi
Jing Ma
Zhiwei Yang
Xiaoxi Mao
Changjie Fan
Rongsheng Zhang
215
5
0
19 Apr 2022
Linearizing Transformer with Key-Value Memory
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yizhe Zhang
Deng Cai
376
6
0
23 Mar 2022
Better Language Model with Hypernym Class Prediction
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Richard He Bai
Tong Wang
Alessandro Sordoni
Peng Shi
273
17
0
21 Mar 2022
1
2
Next
Page 1 of 2