Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14591
Cited By
Base of RoPE Bounds Context Length
23 May 2024
Xin Men
Mingyu Xu
Bingning Wang
Qingyu Zhang
Hongyu Lin
Xianpei Han
Weipeng Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Base of RoPE Bounds Context Length"
18 / 18 papers shown
Title
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
Linda He
Jue Wang
Maurice Weber
Shang Zhu
Ben Athiwaratkun
Ce Zhang
SyDa
LRM
37
0
0
17 Apr 2025
Dewey Long Context Embedding Model: A Technical Report
Dun Zhang
Panxiang Zou
Yudong Zhou
RALM
91
0
0
26 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
36
0
0
19 Mar 2025
Context-aware Biases for Length Extrapolation
Ali Veisi
Amir Mansourian
50
0
0
11 Mar 2025
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang
Wei Huang
Selena Song
Haoyu Zhang
Yusuke Iwasawa
Y. Matsuo
Jiaxian Guo
OODD
LRM
67
2
0
25 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
B. Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Z. Wu
LM&MA
ELM
AI4MH
37
3
0
18 Feb 2025
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
Haoran Lian
Junmin Chen
Wei Huang
Yizhe Xiong
Wenping Hu
...
Hui Chen
Jianwei Niu
Zijia Lin
Fuzheng Zhang
Di Zhang
76
0
0
10 Dec 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang
Qian Liu
Chao Du
Tongyao Zhu
Cunxiao Du
Kenji Kawaguchi
Tianyu Pang
82
5
0
20 Nov 2024
On the token distance modeling ability of higher RoPE attention dimension
Xiangyu Hong
Che Jiang
Biqing Qi
Fandong Meng
Mo Yu
Bowen Zhou
Jie Zhou
18
1
0
11 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
59
16
0
08 Oct 2024
ALR
2
^2
2
: A Retrieve-then-Reason Framework for Long-context Question Answering
Huayang Li
Pat Verga
Priyanka Sen
Bowen Yang
Vijay Viswanathan
Patrick Lewis
Taro Watanabe
Yixuan Su
RALM
LRM
35
0
0
04 Oct 2024
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
Junfeng Tian
Da Zheng
Yang Cheng
Rui-cang Wang
C. Zhang
Debing Zhang
15
1
0
07 Sep 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
18
72
0
29 Jul 2024
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
M. Zhong
Chen Zhang
Yikun Lei
Xikai Liu
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
35
5
0
19 Jun 2024
What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling
Yutong Hu
Quzhe Huang
Kangcheng Luo
Yansong Feng
43
0
0
17 Jun 2024
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
29
32
0
20 Mar 2024
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
223
44
0
13 Sep 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1