ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.16421
  4. Cited By
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length
  Extrapolation
v1v2 (latest)

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

29 January 2024
Zhenyu He
Guhao Feng
Shengjie Luo
Kai-Bo Yang
Liwei Wang
Jingjing Xu
Zhi Zhang
Hongxia Yang
Di He
ArXiv (abs)PDFHTML

Papers citing "Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation"

15 / 15 papers shown
On the Limitations and Capabilities of Position Embeddings for Length Generalization
On the Limitations and Capabilities of Position Embeddings for Length Generalization
Yang Chen
Yitao Liang
Zhouchen Lin
145
0
0
05 Oct 2025
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models
Chang Dai
Hongyu Shan
Mingyang Song
Di Liang
198
3
0
05 Sep 2025
Position Bias Mitigates Position Bias:Mitigate Position Bias Through Inter-Position Knowledge Distillation
Position Bias Mitigates Position Bias:Mitigate Position Bias Through Inter-Position Knowledge Distillation
Yifei Wang
Feng Xiong
Yong Wang
L. Li
Xiangxiang Chu
D. Zeng
294
15
0
21 Aug 2025
Beyond Isolated Capabilities: Bridging Long CoT Reasoning and Long-Context Understanding
Beyond Isolated Capabilities: Bridging Long CoT Reasoning and Long-Context Understanding
Yifei Wang
LRM
166
0
0
20 Jul 2025
SAS: Simulated Attention Score
SAS: Simulated Attention Score
Chuanyang Zheng
J. Sun
Yihang Gao
Yuehao Wang
Peihao Wang
...
Atlas Wang
Mac Schwager
Anderson Schneider
Xiaodong Liu
Jianfeng Gao
AI4TS
302
3
0
10 Jul 2025
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Baihe Huang
Shanda Li
Tianhao Wu
Yiming Yang
Ameet Talwalkar
Kannan Ramchandran
Michael I. Jordan
Jiantao Jiao
LRM
393
4
0
05 Jun 2025
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
519
2
0
27 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
653
4
0
16 May 2025
Context-aware Biases for Length Extrapolation
Context-aware Biases for Length Extrapolation
Ali Veisi
Hamidreza Amirzadeh
Amir Mansourian
637
2
0
11 Mar 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Liang Luo
KELM
410
5
0
01 Jan 2025
Two are better than one: Context window extension with multi-grained
  self-injection
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
243
2
0
25 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
DAPE V2: Process Attention Score as Feature Map for Length ExtrapolationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
412
12
0
07 Oct 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Zongzhang Zhang
Di He
KELM
314
3
0
03 Jul 2024
CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
CAPE: Context-Adaptive Positional Encoding for Length ExtrapolationNeural Information Processing Systems (NeurIPS), 2024
Chuanyang Zheng
Yihang Gao
Han Shi
Minbin Huang
Jingyao Li
...
Xiaozhe Ren
Michael K. Ng
Xin Jiang
Zhenguo Li
Yu Li
136
0
0
23 May 2024
Training-Free Long-Context Scaling of Large Language Models
Training-Free Long-Context Scaling of Large Language Models
Chen An
Fei Huang
Jun Zhang
Shansan Gong
Xipeng Qiu
Chang Zhou
Lingpeng Kong
ALMLRM
352
65
0
27 Feb 2024
1
Page 1 of 1