SELF: Self-Extend the Context Length With Logistic Growth Function

22 May 2025

Papers citing "SELF: Self-Extend the Context Length With Logistic Growth Function"

17 / 17 papers shown

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Chaojun Xiao

Pengle Zhang

Xu Han

Guangxuan Xiao

Yankai Lin

Zhengyan Zhang

Zhiyuan Liu

Maosong Sun

LLMAG

343

105

07 Feb 2024

LLM Maybe LongLM: Self-Extend LLM Context Window Without TuningInternational Conference on Machine Learning (ICML), 2024

Chia-Yuan Chang

530

151

02 Jan 2024

Efficient Streaming Language Models with Attention SinksInternational Conference on Learning Representations (ICLR), 2023

Michel Lang

Yuandong Tian

Beidi Chen

Song Han

Mike Lewis

AI4TS RALM

446

1,262

29 Sep 2023

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise TrainingInternational Conference on Learning Representations (ICLR), 2023

Liang Wang

Sujian Li

444

100

19 Sep 2023

LongBench: A Bilingual, Multitask Benchmark for Long Context UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Jiajie Zhang

...

Lei Hou

Yuxiao Dong

Jie Tang

Juanzi Li

LLMAG RALM

334

932

28 Aug 2023

L-Eval: Instituting Standardized Evaluation for Long Context Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Lingpeng Kong

Xipeng Qiu

ELM ALM

468

202

20 Jul 2023

Extending Context Window of Large Language Models via Positional Interpolation

435

678

27 Jun 2023

Landmark Attention: Random-Access Infinite Context Length for TransformersNeural Information Processing Systems (NeurIPS), 2023

Amirkeivan Mohtashami

Martin Jaggi

LLMAG

323

193

25 May 2023

When Neural Networks Fail to Generalize? A Model Sensitivity PerspectiveAAAI Conference on Artificial Intelligence (AAAI), 2022

216

01 Dec 2022

OPT: Open Pre-trained Transformer Language Models

...

Luke Zettlemoyer

867

4,378

02 May 2022

Train Short, Test Long: Attention with Linear Biases Enables Input Length ExtrapolationInternational Conference on Learning Representations (ICLR), 2021

Ofir Press

Noah A. Smith

M. Lewis

826

1,010

27 Aug 2021

RoFormer: Enhanced Transformer with Rotary Position Embedding

832

3,962

20 Apr 2021

Recent Advances in Adversarial Training for Adversarial RobustnessInternational Joint Conference on Artificial Intelligence (IJCAI), 2021

Jun Zhao

506

581

02 Feb 2021

mT5: A massively multilingual pre-trained text-to-text transformer

692

2,955

22 Oct 2020

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.0K

52,526

28 May 2020

Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019

Jack W. Rae

Anna Potapenko

Siddhant M. Jayakumar

Timothy Lillicrap

RALM VLM KELM

297

770

13 Nov 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

749

4,119

09 Jan 2019