v1v2 (latest)

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Neural Information Processing Systems (NeurIPS), 2022

20 May 2022

Ta-Chung Chi

Ting-Han Fan

Peter J. Ramadge

Alexander I. Rudnicky

ArXiv (abs)PDF HTML Github (18★)

Papers citing "KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation"

50 / 56 papers shown

ShaRP: SHAllow-LayeR Pruning for Video Large Language Models Acceleration

103

05 Dec 2025

Selective Rotary Position Embedding

377

21 Nov 2025

A Circular Argument : Does RoPE need to be Equivariant for Vision?

221

11 Nov 2025

Indirect Attention: Turning Context Misalignment into a Feature

203

30 Sep 2025

SAS: Simulated Attention Score

...

302

10 Jul 2025

Long-Short Alignment for Effective Long-Context Modeling in LLMs

222

13 Jun 2025

Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

243

10 Jun 2025

Native-Resolution Image Synthesis

357

03 Jun 2025

A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models

Kohei Saijo

Tetsuji Ogawa

360

28 Apr 2025

Between Linear and Sinusoidal: Rethinking the Time Encoder in Dynamic Graph Learning

309

10 Apr 2025

FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction

391

08 Apr 2025

On Vanishing Variance in Transformer Length Generalization

308

03 Apr 2025

Where is this coming from? Making groundedness count in the evaluation of Document VQA modelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

327

24 Mar 2025

A Survey on Transformer Context Extension: Approaches and Evaluation

581

17 Mar 2025

Context-aware Biases for Length Extrapolation

Ali Veisi

Hamidreza Amirzadeh

Amir Mansourian

637

11 Mar 2025

437

25 Feb 2025

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

440

12 Feb 2025

Learning the RoPEs: Better 2D and 3D Position Encodings with STRING

...

Krzysztof Choromanski

380

04 Feb 2025

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

412

01 Jan 2025

Provable Length Generalization in Sequence Prediction via Spectral Filtering

386

01 Nov 2024

What is Wrong with Perplexity for Long-context Language Modeling?International Conference on Learning Representations (ICLR), 2024

778

31 Oct 2024

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and ExtrapolationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Yuhan Chen

Ang Lv

Jian Luan

Bin Wang

Wen Liu

265

28 Oct 2024

Rethinking Transformer for Long Contextual Histopathology Whole Slide Image AnalysisNeural Information Processing Systems (NeurIPS), 2024

Lin Yang

359

18 Oct 2024

MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks

290

08 Oct 2024

DAPE V2: Process Attention Score as Feature Map for Length ExtrapolationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Jing Xiong

...

Michael Ng

Xin Jiang

Zhenguo Li

Yu Li

416

07 Oct 2024

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

Zixuan Li

Jing Xiong

Fanghua Ye

Chuanyang Zheng

Xun Wu

...

Lingpeng Kong

414

03 Oct 2024

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Zhiyuan Hu

Yuliang Liu

Jinman Zhao

Suyuchen Wang

Yan Wang

...

Qing Gu

Anh Tuan Luu

See-Kiong Ng

Zhiwei Jiang

Bryan Hooi

408

31 Aug 2024

Human-inspired Episodic Memory for Infinite Context LLMs

462

12 Jul 2024

Universal Length Generalization with Turing Programs

275

03 Jul 2024

Let the Code LLM Edit Itself When You Edit the Code

Jingjing Xu

314

03 Jul 2024

Transformers Can Do Arithmetic with the Right Embeddings

...

300

27 May 2024

MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation

Weiguo Gao

238

26 Mar 2024

CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

Irwin King

367

06 Mar 2024

LLM Inference Unveiled: Survey and Roofline Model Insights

Zhihang Yuan

Yuzhang Shang

Yang Zhou

Zhen Dong

Zhe Zhou

...

Yong Jae Lee

Yan Yan

Beidi Chen

Guangyu Sun

Kurt Keutzer

689

168

26 Feb 2024

Transformers Can Achieve Length Generalization But Not Robustly

361

14 Feb 2024

Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes

303

02 Feb 2024

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Hongxia Yang

256

29 Jan 2024

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Haizhou Li

289

18 Jan 2024

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything ModelComputer Vision and Pattern Recognition (CVPR), 2024

585

04 Jan 2024

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Xiang Cheng

Yuxin Chen

S. Sra

696

11 Dec 2023

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Tianyi Chen

504

01 Dec 2023

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

...

479

114

21 Nov 2023

Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis

359

21 Nov 2023

Addressing the Length Bias Problem in Document-Level Neural Machine Translation

264

20 Nov 2023

Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Ta-Chung Chi

Ting-Han Fan

Alexander I. Rudnicky

181

01 Nov 2023

CLEX: Continuous Length Extrapolation for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Xin Li

360

25 Oct 2023

Extending Input Contexts of Language Models through Training on Segmented Sequences

Petros Karypis

Julian McAuley

George Karypis

329

23 Oct 2023

From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

Shaoxiong Duan

Yining Shi

Wei Xu

349

18 Oct 2023

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window ExtendingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

186

15 Sep 2023

Exploring Transformer ExtrapolationAAAI Conference on Artificial Intelligence (AAAI), 2023

Zhen Qin

Yiran Zhong

Huiyuan Deng

178

19 Jul 2023