Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1901.02860
Cited By

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Ruslan Salakhutdinov

ArXiv (abs)PDF HTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

Anand Gopalakrishnan

Róbert Csordás

Jürgen Schmidhuber

367

1

0

24 Dec 2025

HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition

HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition

Pham Thach Thanh Truc

Huynh Tong Dang Khoa

Vo Nguyen Le Duy

70

0

0

04 Dec 2025

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification

119

0

0

28 Nov 2025

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

Aditya Chattopadhyay

469

0

0

26 Nov 2025

Softmax Transformers are Turing-Complete

Softmax Transformers are Turing-Complete

Anthony Widjaja Lin

170

1

0

25 Nov 2025

Block Cascading: Training Free Acceleration of Block-Causal Video Models

Block Cascading: Training Free Acceleration of Block-Causal Video Models

Hmrishav Bandyopadhyay

Nikhil Pinnaparaju

106

1

0

25 Nov 2025

Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding

Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding

Viet-Hoang Tran

76

0

0

25 Nov 2025

Decoupling Complexity from Scale in Latent Diffusion Model

Tianxiong Zhong

318

1

0

20 Nov 2025

ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum

ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum

Andrija Stanisic

110

0

0

11 Nov 2025

A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation

A Unified Geometric Field Theory Framework for Transformers: From Manifold Embeddings to Kernel Modulation

148

0

0

11 Nov 2025

Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning

Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning

Daniel De Dios Allegue

279

0

0

10 Nov 2025

Discourse Graph Guided Document Translation with Large Language Models

Discourse Graph Guided Document Translation with Large Language Models

Viet-Thanh Pham

277

0

0

10 Nov 2025

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

92

0

0

09 Nov 2025

Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin

Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin

...

377

0

0

08 Nov 2025

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models

Chandra Vamsi Krishna Alla

Harish Naidu Gaddam

288

0

0

07 Nov 2025

Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis

Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis

Alireza Mirrokni

163

0

0

02 Nov 2025

InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames

InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames

163

1

0

31 Oct 2025

Context Engineering 2.0: The Context of Context Engineering

Context Engineering 2.0: The Context of Context Engineering

397

4

0

30 Oct 2025

Bridging the Divide: End-to-End Sequence-Graph Learning

Bridging the Divide: End-to-End Sequence-Graph Learning

122

0

0

29 Oct 2025

DRIP: Dynamic patch Reduction via Interpretable Pooling

DRIP: Dynamic patch Reduction via Interpretable Pooling

284

0

0

29 Oct 2025

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

131

1

0

25 Oct 2025

From Masks to Worlds: A Hitchhiker's Guide to World Models

From Masks to Worlds: A Hitchhiker's Guide to World Models

Ming-Hsuan Yang

185

2

0

23 Oct 2025

Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency

Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency

157

0

0

22 Oct 2025

NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning

NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning

125

0

0

22 Oct 2025

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

140

0

0

22 Oct 2025

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

125

1

0

20 Oct 2025

All You Need is One: Capsule Prompt Tuning with a Single Vector

All You Need is One: Capsule Prompt Tuning with a Single Vector

James Chenhao Liang

146

1

0

19 Oct 2025

RL makes MLLMs see better than SFT

RL makes MLLMs see better than SFT

193

0

0

18 Oct 2025

Extending Audio Context for Long-Form Understanding in Large Audio-Language Models

Extending Audio Context for Long-Form Understanding in Large Audio-Language Models

Yuatyong Chaichana

Pittawat Taveekitworachai

Warit Sirichotedumrong

Potsawee Manakul

Kunat Pipatanakul

160

0

0

17 Oct 2025

A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control

A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control

Daniil Zelezetsky

Egor Cherepanov

Alexey K. Kovelev

Aleksandr I. Panov

144

2

0

15 Oct 2025

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

85

0

0

12 Oct 2025

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

Mohan S. Kankanhalli

98

0

0

11 Oct 2025

Towards Neurocognitive-Inspired Intelligence: From AI's Structural Mimicry to Human-Like Functional Cognition

Towards Neurocognitive-Inspired Intelligence: From AI's Structural Mimicry to Human-Like Functional Cognition

Noorbakhsh Amiri Golilarz

Hassan S. Al Khatib

124

0

0

09 Oct 2025

SUBQRAG: Sub-Question Driven Dynamic Graph RAG

SUBQRAG: Sub-Question Driven Dynamic Graph RAG

156

0

0

09 Oct 2025

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Artificial Hippocampus Networks for Efficient Long-Context Modeling

146

2

0

08 Oct 2025

Allocation of Parameters in Transformers

Allocation of Parameters in Transformers

161

0

0

04 Oct 2025

Platonic Transformers: A Solid Choice For Equivariance

Platonic Transformers: A Solid Choice For Equivariance

Mohammad Mohaiminul Islam

David R. Wessels

Friso de Kruiff

Sharvaree P. Vadgama

288

3

0

03 Oct 2025

POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency

POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency

Saydul Akbar Murad

147

0

0

01 Oct 2025

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

...

184

0

0

01 Oct 2025

3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation

3DiFACE: Synthesizing and Editing Holistic 3D Facial AnimationInternational Conference on 3D Vision (3DV), 2025

Balamurugan Thambiraja

160

1

0

30 Sep 2025

Accelerating Transformers in Online RL

Accelerating Transformers in Online RL

Daniil Zelezetsky

Aleksandr I. Panov

143

0

0

30 Sep 2025

DyMoDreamer: World Modeling with Dynamic Modulation

DyMoDreamer: World Modeling with Dynamic Modulation

147

0

0

29 Sep 2025

LocoFormer: Generalist Locomotion via Long-context Adaptation

LocoFormer: Generalist Locomotion via Long-context Adaptation

144

0

0

28 Sep 2025

PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling

PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling

157

0

0

27 Sep 2025

Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding

Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding

Anurag Kaushish

Tanupriya Choudhury

113

0

0

24 Sep 2025

Memory in Large Language Models: Mechanisms, Evaluation and Evolution

Memory in Large Language Models: Mechanisms, Evaluation and Evolution

217

1

0

23 Sep 2025

ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching

ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching

Chenxingyu Zhao

160

0

0

21 Sep 2025

Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features

Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features

169

0

0

20 Sep 2025

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers

Federico Jurado Ruiz

193

0

0

19 Sep 2025

Long-context Reference-based MT Quality Estimation

Long-context Reference-based MT Quality Estimation

Sheila Castilho

125

1

0

17 Sep 2025

1 2 3 4...39 40 41

Page 1 of 41

Pageof 41