Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
Anand Gopalakrishnan
Róbert Csordás
Jürgen Schmidhuber
M. C. Mozer
363
1
0
24 Dec 2025
HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition
Pham Thach Thanh Truc
Dang Hoai Nam
Huynh Tong Dang Khoa
Vo Nguyen Le Duy
63
0
0
04 Dec 2025
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
Sumit Mamtani
Abhijeet Bhure
117
0
0
28 Nov 2025
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Liangzu Peng
Aditya Chattopadhyay
Luca Zancato
Elvis Nunez
Wei Xia
Stefano Soatto
464
0
0
26 Nov 2025
Softmax Transformers are Turing-Complete
Hongjian Jiang
Michael Hahn
Georg Zetzsche
Anthony Widjaja Lin
LRM
168
0
0
25 Nov 2025
Block Cascading: Training Free Acceleration of Block-Causal Video Models
Hmrishav Bandyopadhyay
Nikhil Pinnaparaju
Rahim Entezari
Jim Scott
Yi-Zhe Song
Varun Jampani
VGen
106
1
0
25 Nov 2025
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
Duy-Tung Pham
A. Nguyen
Viet-Hoang Tran
Nhan-Phu Chung
Xin T. Tong
T. Nguyen
Thieu N. Vo
73
0
0
25 Nov 2025
Decoupling Complexity from Scale in Latent Diffusion Model
Tianxiong Zhong
Xingye Tian
X. Wang
Boyuan Jiang
Xin Tao
Pengfei Wan
DiffM
317
1
0
20 Nov 2025
ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
Andrija Stanisic
Stefan Nastic
107
0
0
11 Nov 2025
A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation
Xianshuai Shi
Jianfeng Zhu
Leibo Liu
140
0
0
11 Nov 2025
Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning
Daniel De Dios Allegue
J. He
F. Oliehoek
OffRL
276
0
0
10 Nov 2025
Discourse Graph Guided Document Translation with Large Language Models
Viet-Thanh Pham
Minghan Wang
Hao-Han Liao
Thuy-Trang Vu
276
0
0
10 Nov 2025
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma
Ruoxiang Xu
Yongqiang Cai
92
0
0
09 Nov 2025
Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin
Lin Guan
Jia-Qi Yang
Zhishan Zhao
Beichuan Zhang
Bo Sun
...
Hangyu Wang
Qiwei Chen
Yi Cheng
Feng Zhang
Xiao Yang
OffRL
VLM
374
0
0
08 Nov 2025
BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models
Chandra Vamsi Krishna Alla
Harish Naidu Gaddam
Manohar Kommi
RALM
285
0
0
07 Nov 2025
Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis
Asal Meskin
Alireza Mirrokni
Ali Najar
Ali Behrouz
AI4TS
163
0
0
02 Nov 2025
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames
Haorui Li
Weitao Du
Yuqiang Li
Ziqiao Wang
Shengchao Liu
151
1
0
31 Oct 2025
Context Engineering 2.0: The Context of Context Engineering
Qishuo Hua
Lyumanshan Ye
Dayuan Fu
Yang Xiao
Xiaojie Cai
Yunze Wu
Jifan Lin
Junfei Wang
Pengfei Liu
390
4
0
30 Oct 2025
Bridging the Divide: End-to-End Sequence-Graph Learning
Yuen Chen
Yulun Wu
Samuel Sharpe
Igor Melnyk
Nam Nguyen
Furong Huang
C. Bayan Bruss
Rizal Fathony
118
0
0
29 Oct 2025
DRIP: Dynamic patch Reduction via Interpretable Pooling
Yusen Peng
Sachin Kumar
VLM
282
0
0
29 Oct 2025
Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows
Billy Dickson
Zoran Tiganj
CLL
124
1
0
25 Oct 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
Jinbin Bai
Yu Lei
H. Wu
Yuchen Zhu
Shufan Li
Yi Xin
Xiangtai Li
Molei Tao
Aditya Grover
Ming-Hsuan Yang
VGen
SyDa
185
2
0
23 Oct 2025
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Renzhao Liang
Sizhe Xu
Chenggang Xie
Jingru Chen
Feiyang Ren
Shu Yang
Takahiro Yabe
AI4TS
157
0
0
22 Oct 2025
NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning
Wonje Choi
Jooyoung Kim
Honguk Woo
LRM
125
0
0
22 Oct 2025
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
Gunshi Gupta
Karmesh Yadav
Z. Kira
Y. Gal
Rahaf Aljundi
OffRL
136
0
0
22 Oct 2025
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Jiaqi Leng
Xiang Hu
Junxiong Wang
Jianguo Li
Wei Wu
Yucheng Lu
122
1
0
20 Oct 2025
All You Need is One: Capsule Prompt Tuning with a Single Vector
Yiyang Liu
James Chenhao Liang
Heng Fan
Wenhao Yang
Yiming Cui
Xiaotian Han
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
VLM
146
1
0
19 Oct 2025
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
193
0
0
18 Oct 2025
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana
Pittawat Taveekitworachai
Warit Sirichotedumrong
Potsawee Manakul
Kunat Pipatanakul
AuLLM
155
0
0
17 Oct 2025
A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control
Nikita Kachaev
Daniil Zelezetsky
Egor Cherepanov
Alexey K. Kovelev
Aleksandr I. Panov
OffRL
143
2
0
15 Oct 2025
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Jiaye Li
Baoyou Chen
Hui Li
Zilong Dong
Jingdong Wang
Siyu Zhu
85
0
0
12 Oct 2025
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
Hehe Fan
Yi Yang
Mohan S. Kankanhalli
Fei Wu
ViT
94
0
0
11 Oct 2025
Towards Neurocognitive-Inspired Intelligence: From AI's Structural Mimicry to Human-Like Functional Cognition
Noorbakhsh Amiri Golilarz
Hassan S. Al Khatib
Shahram Rahimi
124
0
0
09 Oct 2025
SUBQRAG: Sub-Question Driven Dynamic Graph RAG
Jiaoyang Li
Junhao Ruan
Shengwei Tang
Saihan Chen
Kaiyan Chang
Yuan Ge
Tong Xiao
Jingbo Zhu
150
0
0
09 Oct 2025
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Yunhao Fang
Weihao Yu
Shu Zhong
Qinghao Ye
Xuehan Xiong
Lai Wei
146
2
0
08 Oct 2025
Allocation of Parameters in Transformers
Ruoxi Yu
Haotian Jiang
Jingpu Cheng
Penghao Yu
Qianxiao Li
Zhong Li
MoE
160
0
0
04 Oct 2025
Platonic Transformers: A Solid Choice For Equivariance
Mohammad Mohaiminul Islam
Rishabh Anand
David R. Wessels
Friso de Kruiff
T. Kuipers
Rex Ying
C. Sánchez
Sharvaree P. Vadgama
Georg Bökman
Erik Bekkers
288
3
0
03 Oct 2025
POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
147
0
0
01 Oct 2025
SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
Jiaye Tan
Haonan Luo
Linfeng Song
Shuaiqi Chen
Yishan Lyu
...
Haoran Zhang
Jiaming Bai
Haoran Cheng
Q. Vera Liao
Hao-Wen Dong
181
0
0
01 Oct 2025
3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation
International Conference on 3D Vision (3DV), 2025
Balamurugan Thambiraja
Malte Prinzler
S. Aliakbarian
Darren Cosker
Justus Thies
DiffM
VGen
156
1
0
30 Sep 2025
Accelerating Transformers in Online RL
Daniil Zelezetsky
A. Kovalev
Aleksandr I. Panov
OffRL
143
0
0
30 Sep 2025
DyMoDreamer: World Modeling with Dynamic Modulation
Boxuan Zhang
Runqing Wang
Wei Xiao
Weipu Zhang
Jian Sun
Gao Huang
Jie Chen
Gang Wang
145
0
0
29 Sep 2025
LocoFormer: Generalist Locomotion via Long-context Adaptation
Min Liu
Deepak Pathak
Ananye Agarwal
140
0
0
28 Sep 2025
PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling
Yukun Zhang
Xueqing Zhou
AI4CE
156
0
0
27 Sep 2025
Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding
Ayan Sar
Sampurna Roy
Kanav Gupta
Anurag Kaushish
Tanupriya Choudhury
Abhijit Kumar
106
0
0
24 Sep 2025
Memory in Large Language Models: Mechanisms, Evaluation and Evolution
D. Zhang
Wendong Li
Kani Song
Jiaye Lu
Gang Li
Liuchun Yang
Sheng Li
KELM
214
1
0
23 Sep 2025
ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
Xingyu Xiang
Raj Joshi
Yuhan Liu
Jiayi Yao
Chenxingyu Zhao
Junchen Jiang
Yang Zhou
Eddie Kohler
Minlan Yu
160
0
0
21 Sep 2025
Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features
Kaichen Xu
Yihang Du
Mianpeng Liu
Zimu Yu
Xiaobo Sun
159
0
0
20 Sep 2025
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Krati Saxena
Federico Jurado Ruiz
Guido Manzi
Dianbo Liu
Alex Lamb
192
0
0
19 Sep 2025
Long-context Reference-based MT Quality Estimation
Sami Ul Haq
Chinonso Osuji
Sheila Castilho
Brian Davis
124
1
0
17 Sep 2025
1
2
3
4
...
39
40
41
Next