ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.10554
  4. Cited By
A Length-Extrapolatable Transformer

A Length-Extrapolatable Transformer

20 December 2022
Yutao Sun
Li Dong
Barun Patra
Shuming Ma
Shaohan Huang
Alon Benhaim
Vishrav Chaudhary
Xia Song
Furu Wei
ArXivPDFHTML

Papers citing "A Length-Extrapolatable Transformer"

50 / 77 papers shown
Title
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
Linda He
Jue Wang
Maurice Weber
Shang Zhu
Ben Athiwaratkun
Ce Zhang
SyDa
LRM
42
0
0
17 Apr 2025
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao
Jingru Tan
Kaihuan Liang
Feizhao Zhang
Yazhe Niu
Jiahao Hu
Ruihao Gong
Dahua Lin
Ningyi Xu
57
0
0
10 Mar 2025
SAGE-Amine: Generative Amine Design with Multi-Property Optimization for Efficient CO2 Capture
Hocheol Lim
Hyein Cho
Jeonghoon Kim
62
0
0
04 Mar 2025
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu
Zhenyu He
Sijie Li
Xun Zhou
Jun Zhang
Jingjing Xu
Di He
OffRL
LRM
86
4
0
12 Feb 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
41
0
0
22 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Z. Wang
KELM
36
1
0
03 Jan 2025
Retentive Neural Quantum States: Efficient Ansätze for Ab Initio
  Quantum Chemistry
Retentive Neural Quantum States: Efficient Ansätze for Ab Initio Quantum Chemistry
Oliver Knitter
Dan Zhao
J. Stokes
M. Ganahl
Stefan Leichenauer
S. Veerapaneni
37
1
0
06 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
58
4
0
31 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token
  Semantic State Vector for Multi-Token Prediction
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Nicholas Walker
19
0
0
23 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion
  Model
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
69
3
0
17 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online
  Attractor Extraction
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
19
0
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient
  Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jianguo Li
Weiyao Lin
VLM
36
1
0
09 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
26
1
0
07 Oct 2024
Accelerating Inference of Networks in the Frequency Domain
Accelerating Inference of Networks in the Frequency Domain
Chenqiu Zhao
Guanfang Dong
Anup Basu
33
10
0
06 Oct 2024
RetCompletion:High-Speed Inference Image Completion with Retentive
  Network
RetCompletion:High-Speed Inference Image Completion with Retentive Network
Yueyang Cang
P. Hu
Xiaoteng Zhang
Xingtong Wang
Yuhang Liu
VLM
24
0
0
05 Oct 2024
On The Adaptation of Unlimiformer for Decoder-Only Transformers
On The Adaptation of Unlimiformer for Decoder-Only Transformers
Kian Ahrabian
Alon Benhaim
Barun Patra
Jay Pujara
Saksham Singhal
Xia Song
30
0
0
02 Oct 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
83
1
0
20 Sep 2024
E2LLM: Encoder Elongated Large Language Models for Long-Context
  Understanding and Reasoning
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
Zihan Liao
Jun Wang
Hang Yu
Lingxiao Wei
Jianguo Li
Jun Wang
Wei Zhang
19
2
0
10 Sep 2024
On the Design Space Between Transformers and Recursive Neural Nets
On the Design Space Between Transformers and Recursive Neural Nets
Jishnu Ray Chowdhury
Cornelia Caragea
17
0
0
03 Sep 2024
LongRecipe: Recipe for Efficient Long Context Generalization in Large
  Language Models
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
Zhiyuan Hu
Yuliang Liu
Jinman Zhao
Suyuchen Wang
Yan Wang
...
Qing Gu
Anh Tuan Luu
See-Kiong Ng
Zhiwei Jiang
Bryan Hooi
50
11
0
31 Aug 2024
ReAttention: Training-Free Infinite Context with Finite Attention Scope
ReAttention: Training-Free Infinite Context with Finite Attention Scope
Xiaoran Liu
Ruixiao Li
Yuerong Song
Zhigeng Liu
Kai Lv
Hang Yan
Hang Yan
Linlin Li
Qun Liu
Xipeng Qiu
LLMAG
25
1
0
21 Jul 2024
Toto: Time Series Optimized Transformer for Observability
Toto: Time Series Optimized Transformer for Observability
Ben Cohen
E. Khwaja
Kan Wang
Charles Masson
Elise Ramé
Youssef Doubli
Othmane Abou-Amal
AI4TS
35
3
0
10 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
29
0
0
03 Jul 2024
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Hongzhan Lin
Ang Lv
Yuhan Chen
Chen Zhu
Yang Song
Hengshu Zhu
Rui Yan
29
9
0
28 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
67
13
0
20 Jun 2024
Learning 1D Causal Visual Representation with De-focus Attention
  Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Yu Qiao
Jie Zhou
Jifeng Dai
60
1
0
06 Jun 2024
LongSSM: On the Length Extension of State-space Models in Language
  Modelling
LongSSM: On the Length Extension of State-space Models in Language Modelling
Shida Wang
22
0
0
04 Jun 2024
Base of RoPE Bounds Context Length
Base of RoPE Bounds Context Length
Xin Men
Mingyu Xu
Bingning Wang
Qingyu Zhang
Hongyu Lin
Xianpei Han
Weipeng Chen
29
18
0
23 May 2024
Transforming the Bootstrap: Using Transformers to Compute Scattering
  Amplitudes in Planar N = 4 Super Yang-Mills Theory
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory
Tianji Cai
G. W. Merz
Franccois Charton
Niklas Nolte
Matthias Wilhelm
K. Cranmer
Lance J. Dixon
22
15
0
09 May 2024
Tele-FLM Technical Report
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
30
3
0
25 Apr 2024
A Theory for Length Generalization in Learning to Reason
A Theory for Length Generalization in Learning to Reason
Changnan Xiao
Bing Liu
LRM
29
8
0
31 Mar 2024
Multichannel Long-Term Streaming Neural Speech Enhancement for Static
  and Moving Speakers
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
Changsheng Quan
Xiaofei Li
39
23
0
12 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
77
0
26 Feb 2024
LVCHAT: Facilitating Long Video Comprehension
LVCHAT: Facilitating Long Video Comprehension
Yu-Xiang Wang
Zeyuan Zhang
Julian McAuley
Zexue He
VLM
26
4
0
19 Feb 2024
Data Engineering for Scaling Language Models to 128K Context
Data Engineering for Scaling Language Models to 128K Context
Yao Fu
Rameswar Panda
Xinyao Niu
Xiang Yue
Hanna Hajishirzi
Yoon Kim
Hao-Chun Peng
MoE
34
115
0
15 Feb 2024
Bidirectional Generative Pre-training for Improving Time Series
  Representation Learning
Bidirectional Generative Pre-training for Improving Time Series Representation Learning
Ziyang Song
Qincheng Lu
He Zhu
Yue Li
AI4TS
14
3
0
14 Feb 2024
MEMORYLLM: Towards Self-Updatable Large Language Models
MEMORYLLM: Towards Self-Updatable Large Language Models
Yu-Xiang Wang
Yifan Gao
Xiusi Chen
Haoming Jiang
Shiyang Li
...
Zheng Li
Xian Li
Bing Yin
Jingbo Shang
Julian McAuley
KELM
27
16
0
07 Feb 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an
  Efficient Context Memory
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao
Pengle Zhang
Xu Han
Guangxuan Xiao
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Maosong Sun
LLMAG
39
33
0
07 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
29
35
0
03 Feb 2024
Bass Accompaniment Generation via Latent Diffusion
Bass Accompaniment Generation via Latent Diffusion
Marco Pasini
M. Grachten
Stefan Lattner
38
11
0
02 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
34
1
0
01 Feb 2024
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length
  Extrapolation
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
Zhenyu He
Guhao Feng
Shengjie Luo
Kai-Bo Yang
Liwei Wang
Jingjing Xu
Zhi Zhang
Hongxia Yang
Di He
19
13
0
29 Jan 2024
FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather
  Forecasting
FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting
Tao Han
Song Guo
Fenghua Ling
Kang Chen
Junchao Gong
Jing-Jia Luo
Junxia Gu
Kan Dai
Wanli Ouyang
Lei Bai
AI4Cl
13
12
0
28 Jan 2024
Code-Based English Models Surprising Performance on Chinese QA Pair
  Extraction Task
Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task
Linghan Zheng
Hui Liu
Xiaojun Lin
Jiayuan Dong
Yue Sheng
Gang Shi
Zhiwei Liu
Hongwei Chen
17
0
0
16 Jan 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Zirui Liu
Chia-Yuan Chang
Huiyuan Chen
Xia Hu
15
99
0
02 Jan 2024
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
Bin Lei
Le Chen
Caiwen Ding
VGen
17
1
0
30 Dec 2023
Holistic chemical evaluation reveals pitfalls in reaction prediction
  models
Holistic chemical evaluation reveals pitfalls in reaction prediction models
Victor Sabanza Gil
Andres M Bran
Malte Franke
Remi Schlama
J. Luterbacher
Philippe Schwaller
ELM
21
1
0
14 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
40
138
0
11 Dec 2023
Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops
Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops
Aditya Prakash
Arjun Gupta
Saurabh Gupta
19
3
0
11 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
28
53
0
21 Nov 2023
12
Next