ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
Anand Gopalakrishnan
Róbert Csordás
Jürgen Schmidhuber
M. C. Mozer
363
1
0
24 Dec 2025
HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition
HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition
Pham Thach Thanh Truc
Dang Hoai Nam
Huynh Tong Dang Khoa
Vo Nguyen Le Duy
63
0
0
04 Dec 2025
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
Sumit Mamtani
Abhijeet Bhure
117
0
0
28 Nov 2025
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Liangzu Peng
Aditya Chattopadhyay
Luca Zancato
Elvis Nunez
Wei Xia
Stefano Soatto
464
0
0
26 Nov 2025
Softmax Transformers are Turing-Complete
Softmax Transformers are Turing-Complete
Hongjian Jiang
Michael Hahn
Georg Zetzsche
Anthony Widjaja Lin
LRM
168
0
0
25 Nov 2025
Block Cascading: Training Free Acceleration of Block-Causal Video Models
Block Cascading: Training Free Acceleration of Block-Causal Video Models
Hmrishav Bandyopadhyay
Nikhil Pinnaparaju
Rahim Entezari
Jim Scott
Yi-Zhe Song
Varun Jampani
VGen
106
1
0
25 Nov 2025
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
Duy-Tung Pham
A. Nguyen
Viet-Hoang Tran
Nhan-Phu Chung
Xin T. Tong
T. Nguyen
Thieu N. Vo
73
0
0
25 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
317
1
0
20 Nov 2025
ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
Andrija Stanisic
Stefan Nastic
107
0
0
11 Nov 2025
A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation
A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation
Xianshuai Shi
Jianfeng Zhu
Leibo Liu
140
0
0
11 Nov 2025
Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning
Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning
Daniel De Dios Allegue
J. He
F. Oliehoek
OffRL
276
0
0
10 Nov 2025
Discourse Graph Guided Document Translation with Large Language Models
Discourse Graph Guided Document Translation with Large Language Models
Viet-Thanh Pham
Minghan Wang
Hao-Han Liao
Thuy-Trang Vu
276
0
0
10 Nov 2025
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma
Ruoxiang Xu
Yongqiang Cai
92
0
0
09 Nov 2025
Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin
Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin
Lin Guan
Jia-Qi Yang
Zhishan Zhao
Beichuan Zhang
Bo Sun
...
Hangyu Wang
Qiwei Chen
Yi Cheng
Feng Zhang
Xiao Yang
OffRLVLM
374
0
0
08 Nov 2025
BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models
BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models
Chandra Vamsi Krishna Alla
Harish Naidu Gaddam
Manohar Kommi
RALM
285
0
0
07 Nov 2025
Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis
Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis
Asal Meskin
Alireza Mirrokni
Ali Najar
Ali Behrouz
AI4TS
163
0
0
02 Nov 2025
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames
Haorui Li
Weitao Du
Yuqiang Li
Ziqiao Wang
Shengchao Liu
151
1
0
31 Oct 2025
Context Engineering 2.0: The Context of Context Engineering
Context Engineering 2.0: The Context of Context Engineering
Qishuo Hua
Lyumanshan Ye
Dayuan Fu
Yang Xiao
Xiaojie Cai
Yunze Wu
Jifan Lin
Junfei Wang
Pengfei Liu
390
4
0
30 Oct 2025
Bridging the Divide: End-to-End Sequence-Graph Learning
Bridging the Divide: End-to-End Sequence-Graph Learning
Yuen Chen
Yulun Wu
Samuel Sharpe
Igor Melnyk
Nam Nguyen
Furong Huang
C. Bayan Bruss
Rizal Fathony
118
0
0
29 Oct 2025
DRIP: Dynamic patch Reduction via Interpretable Pooling
DRIP: Dynamic patch Reduction via Interpretable Pooling
Yusen Peng
Sachin Kumar
VLM
282
0
0
29 Oct 2025
Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows
Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows
Billy Dickson
Zoran Tiganj
CLL
124
1
0
25 Oct 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
From Masks to Worlds: A Hitchhiker's Guide to World Models
Jinbin Bai
Yu Lei
H. Wu
Yuchen Zhu
Shufan Li
Yi Xin
Xiangtai Li
Molei Tao
Aditya Grover
Ming-Hsuan Yang
VGenSyDa
185
2
0
23 Oct 2025
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Renzhao Liang
Sizhe Xu
Chenggang Xie
Jingru Chen
Feiyang Ren
Shu Yang
Takahiro Yabe
AI4TS
157
0
0
22 Oct 2025
NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning
NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning
Wonje Choi
Jooyoung Kim
Honguk Woo
LRM
125
0
0
22 Oct 2025
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Gunshi Gupta
Karmesh Yadav
Z. Kira
Y. Gal
Rahaf Aljundi
OffRL
136
0
0
22 Oct 2025
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Jiaqi Leng
Xiang Hu
Junxiong Wang
Jianguo Li
Wei Wu
Yucheng Lu
122
1
0
20 Oct 2025
All You Need is One: Capsule Prompt Tuning with a Single Vector
All You Need is One: Capsule Prompt Tuning with a Single Vector
Yiyang Liu
James Chenhao Liang
Heng Fan
Wenhao Yang
Yiming Cui
Xiaotian Han
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
VLM
146
1
0
19 Oct 2025
RL makes MLLMs see better than SFT
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
193
0
0
18 Oct 2025
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana
Pittawat Taveekitworachai
Warit Sirichotedumrong
Potsawee Manakul
Kunat Pipatanakul
AuLLM
155
0
0
17 Oct 2025
A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control
A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control
Nikita Kachaev
Daniil Zelezetsky
Egor Cherepanov
Alexey K. Kovelev
Aleksandr I. Panov
OffRL
143
2
0
15 Oct 2025
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Jiaye Li
Baoyou Chen
Hui Li
Zilong Dong
Jingdong Wang
Siyu Zhu
85
0
0
12 Oct 2025
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
Hehe Fan
Yi Yang
Mohan S. Kankanhalli
Fei Wu
ViT
94
0
0
11 Oct 2025
Towards Neurocognitive-Inspired Intelligence: From AI's Structural Mimicry to Human-Like Functional Cognition
Towards Neurocognitive-Inspired Intelligence: From AI's Structural Mimicry to Human-Like Functional Cognition
Noorbakhsh Amiri Golilarz
Hassan S. Al Khatib
Shahram Rahimi
124
0
0
09 Oct 2025
SUBQRAG: Sub-Question Driven Dynamic Graph RAG
SUBQRAG: Sub-Question Driven Dynamic Graph RAG
Jiaoyang Li
Junhao Ruan
Shengwei Tang
Saihan Chen
Kaiyan Chang
Yuan Ge
Tong Xiao
Jingbo Zhu
150
0
0
09 Oct 2025
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Yunhao Fang
Weihao Yu
Shu Zhong
Qinghao Ye
Xuehan Xiong
Lai Wei
146
2
0
08 Oct 2025
Allocation of Parameters in Transformers
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
160
0
0
04 Oct 2025
Platonic Transformers: A Solid Choice For Equivariance
Platonic Transformers: A Solid Choice For Equivariance
Mohammad Mohaiminul Islam
Rishabh Anand
David R. Wessels
Friso de Kruiff
T. Kuipers
Rex Ying
C. Sánchez
Sharvaree P. Vadgama
Georg Bökman
Erik Bekkers
288
3
0
03 Oct 2025
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
147
0
0
01 Oct 2025
SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
Jiaye Tan
Haonan Luo
Linfeng Song
Shuaiqi Chen
Yishan Lyu
...
Haoran Zhang
Jiaming Bai
Haoran Cheng
Q. Vera Liao
Hao-Wen Dong
181
0
0
01 Oct 2025
3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation
3DiFACE: Synthesizing and Editing Holistic 3D Facial AnimationInternational Conference on 3D Vision (3DV), 2025
Balamurugan Thambiraja
Malte Prinzler
S. Aliakbarian
Darren Cosker
Justus Thies
DiffMVGen
156
1
0
30 Sep 2025
Accelerating Transformers in Online RL
Accelerating Transformers in Online RL
Daniil Zelezetsky
A. Kovalev
Aleksandr I. Panov
OffRL
143
0
0
30 Sep 2025
DyMoDreamer: World Modeling with Dynamic Modulation
DyMoDreamer: World Modeling with Dynamic Modulation
Boxuan Zhang
Runqing Wang
Wei Xiao
Weipu Zhang
Jian Sun
Gao Huang
Jie Chen
Gang Wang
145
0
0
29 Sep 2025
LocoFormer: Generalist Locomotion via Long-context Adaptation
LocoFormer: Generalist Locomotion via Long-context Adaptation
Min Liu
Deepak Pathak
Ananye Agarwal
140
0
0
28 Sep 2025
PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling
PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling
Yukun Zhang
Xueqing Zhou
AI4CE
156
0
0
27 Sep 2025
Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding
Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding
Ayan Sar
Sampurna Roy
Kanav Gupta
Anurag Kaushish
Tanupriya Choudhury
Abhijit Kumar
106
0
0
24 Sep 2025
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
D. Zhang
Wendong Li
Kani Song
Jiaye Lu
Gang Li
Liuchun Yang
Sheng Li
KELM
214
1
0
23 Sep 2025
ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
Xingyu Xiang
Raj Joshi
Yuhan Liu
Jiayi Yao
Chenxingyu Zhao
Junchen Jiang
Yang Zhou
Eddie Kohler
Minlan Yu
160
0
0
21 Sep 2025
Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features
Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features
Kaichen Xu
Yihang Du
Mianpeng Liu
Zimu Yu
Xiaobo Sun
159
0
0
20 Sep 2025
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Krati Saxena
Federico Jurado Ruiz
Guido Manzi
Dianbo Liu
Alex Lamb
192
0
0
19 Sep 2025
Long-context Reference-based MT Quality Estimation
Long-context Reference-based MT Quality Estimation
Sami Ul Haq
Chinonso Osuji
Sheila Castilho
Brian Davis
124
1
0
17 Sep 2025
1234...394041
Next