ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
Temporal Chunking Enhances Recognition of Implicit Sequential Patterns
Temporal Chunking Enhances Recognition of Implicit Sequential Patterns
Jayanta Dey
Nicholas Soures
Miranda Gonzales
Itamar Lerner
Christopher Kanan
Dhireesha Kudithipudi
273
0
0
31 May 2025
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
Yiming Lei
Zhizheng Yang
Zeming Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
272
0
0
29 May 2025
A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors
A New Deep-learning-Based Approach For mRNA Optimization: High Fidelity, Computation Efficiency, and Multiple Optimization Factors
Zheng Gong
Ziyi Jiang
Weihao Gao
Deng Zhuo
Lan Ma
136
2
0
29 May 2025
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems
Christopher Ormerod
246
0
0
28 May 2025
Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
Yukun Zhang
Xueqing Zhou
AI4TS
157
1
0
27 May 2025
PIPE: Physics-Informed Position Encoding for Alignment of Satellite Images and Time Series
PIPE: Physics-Informed Position Encoding for Alignment of Satellite Images and Time Series
Haobo Li
Eunseo Jung
Zixin Chen
Zhaowei Wang
Yueya Wang
Huamin Qu
Alexis Kai Hon Lau
193
0
0
27 May 2025
ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models
ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models
Bozhou Li
Wentao Zhang
VLM
180
1
0
27 May 2025
Transformers in Protein: A Survey
Transformers in Protein: A Survey
Xiaowen Ling
Zhiqiang Li
Yanbin Wang
Zhuhong You
ViTMedImAI4CE
338
0
0
26 May 2025
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhongzhan Huang
Guoming Ling
Shanshan Zhong
Hefeng Wu
Liang Lin
292
0
0
26 May 2025
Anchored Diffusion Language Model
Anchored Diffusion Language Model
Litu Rout
Constantine Caramanis
Sanjay Shakkottai
362
4
0
24 May 2025
LatentLLM: Attention-Aware Joint Tensor Compression
LatentLLM: Attention-Aware Joint Tensor Compression
T. Koike-Akino
Xiangyu Chen
Jing Liu
Ye Wang
Wang
Matthew Brand
231
3
0
23 May 2025
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
Tianyu Xie
Shuchen Xue
Zijin Feng
Tianyang Hu
Jiacheng Sun
Zhenguo Li
Cheng Zhang
DiffM
1.0K
1
0
23 May 2025
Training Long-Context LLMs Efficiently via Chunk-wise Optimization
Training Long-Context LLMs Efficiently via Chunk-wise OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wenhao Li
Yuxin Zhang
Gen Luo
Daohai Yu
Jiayi Ji
98
3
0
22 May 2025
SELF: Self-Extend the Context Length With Logistic Growth Function
SELF: Self-Extend the Context Length With Logistic Growth Function
Phat Thanh Dang
Saahil Thoppay
Wang Yang
Qifan Wang
Vipin Chaudhary
Xiaotian Han
271
0
0
22 May 2025
LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols
LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols
Ziming Liu
Bryan Liu
Alvaro Valcarce
Xiaoli Chu
365
3
0
22 May 2025
Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes
Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes
Zixun Guo
Simon Dixon
248
4
0
21 May 2025
dKV-Cache: The Cache for Diffusion Language Models
dKV-Cache: The Cache for Diffusion Language Models
Xinyin Ma
Runpeng Yu
Gongfan Fang
Xinchao Wang
DiffM
421
64
0
21 May 2025
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning
Mahesh Godavarti
CoGe
241
3
0
21 May 2025
NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts
NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts
Abhay Gupta
Michael Lu
Kevin Zhu
Sean O'Brien
Sean O Brien
LRM
312
0
0
20 May 2025
CoRank: LLM-Based Compact Reranking with Document Features for Scientific Retrieval
CoRank: LLM-Based Compact Reranking with Document Features for Scientific Retrieval
Runchu Tian
Xueqiang Xu
Sara Szymkuć
SeongKu Kang
Jiawei Han
345
4
0
19 May 2025
Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-Constrained Pruning
Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-Constrained PruningInternational Symposium on Computer Architecture (ISCA), 2025
Boxun Xu
Yuxuan Yin
Vikram Iyer
Peng Li
MoE
265
2
0
18 May 2025
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Markos A. Katsoulakis
242
0
0
16 May 2025
Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes
Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes
Ashok Arora
Neetesh Kumar
236
0
0
16 May 2025
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
Jintian Shao
Hongyi Huang
Hongyi Huang
Beiwen Zhang
ZhiYu Wu
You Shan
MingKai Zheng
319
0
0
15 May 2025
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons
Andrew Kiruluta
Preethi Raju
Priscilla Burity
110
1
0
09 May 2025
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Feng Liu
Nicholas Chimitt
Lanqing guo
Jitesh Jain
Aditya Kane
...
Arun Ross
Humphrey Shi
Zinan Lin
A. Jain
Xiaoming Liu
CVBM
236
5
0
07 May 2025
A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts
A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts
Yingquan Chen
Qianmu Li
Xiaocong Wu
Huifeng Li
Qing Chang
DiffM
343
1
0
02 May 2025
Compact Recurrent Transformer with Persistent Memory
Compact Recurrent Transformer with Persistent Memory
Edison Mucllari
Z. Daniels
David C. Zhang
Qiang Ye
CLLVLM
346
1
0
02 May 2025
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Heng-Sheng Chang
P. Mehta
291
2
0
01 May 2025
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
Michael Pichat
William Pogrund
Paloma Pichat
Judicael Poumay
Armanouche Gasparian
Samuel Demarchi
Martin Corbet
Alois Georgeon
Michael Veillet-Guillem
MILM
290
0
0
30 Apr 2025
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
Andrew Kiruluta
224
1
0
29 Apr 2025
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
Yi Lu
Wanxu Zhao
Xin Zhou
Chenxin An
Cong Wang
...
Jun Zhao
Changzhi Sun
Tao Gui
Tao Gui
Qi Zhang
239
0
0
26 Apr 2025
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRLAI4TS
370
21
0
22 Apr 2025
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
Zheng Li
Wenshu Fan
Wei Zhang
Tailing Yuan
Bin Chen
Chengru Song
Chen Zhang
212
3
0
20 Apr 2025
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
M. Chowdhury
Md Rifat Ur Rahman
Akil Ahmad Taki
225
0
0
19 Apr 2025
CacheFormer: High Attention-Based Segment Caching
CacheFormer: High Attention-Based Segment CachingApplied Informatics (AI), 2025
Sushant Singh
A. Mahmood
224
1
0
18 Apr 2025
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
Yuxuan Tian
Zihan Wang
Yebo Peng
Aomufei Yuan
Zhaoxiang Wang
Bairen Yi
Xin Liu
Yong Cui
Tong Yang
374
0
0
14 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
316
0
0
09 Apr 2025
Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding
Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding
Zahir Alsulaimawi
140
1
0
07 Apr 2025
On Vanishing Variance in Transformer Length Generalization
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
258
3
0
03 Apr 2025
Semantic Adapter for Universal Text Embeddings: Diagnosing and Mitigating Negation Blindness to Enhance Universality
Semantic Adapter for Universal Text Embeddings: Diagnosing and Mitigating Negation Blindness to Enhance Universality
Hongliu Cao
420
1
0
01 Apr 2025
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
Yuxuan Zhu
Ali Falahati
David H. Yang
Mohammad Mohammadi Amiri
315
1
0
01 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGenVGen
564
3
0
01 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
547
1
0
29 Mar 2025
SocialGen: Modeling Multi-Human Social Interaction with Language Models
SocialGen: Modeling Multi-Human Social Interaction with Language Models
Heng Yu
Juze Zhang
Changan Chen
Tiange Xiang
Yusu Fang
Juan Carlos Niebles
Ehsan Adeli
VGen
270
5
0
28 Mar 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xinyu Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
439
3
0
28 Mar 2025
Semi-supervised Node Importance Estimation with Informative Distribution Modeling for Uncertainty Regularization
Semi-supervised Node Importance Estimation with Informative Distribution Modeling for Uncertainty RegularizationThe Web Conference (WWW), 2025
Yankai Chen
Taotao Wang
Yixiang Fang
Yunyu Xiao
BDL
486
4
0
26 Mar 2025
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing
Vishnu Asutosh Dasu
Md Rafi Ur Rashid
Vipul Gupta
Saeid Tizpaz-Niari
Gang Tan
AAML
452
2
0
20 Mar 2025
Intra-neuronal attention within language models Relationships between activation and semantics
Intra-neuronal attention within language models Relationships between activation and semantics
Michael Pichat
William Pogrund
Paloma Pichat
Armanouche Gasparian
Samuel Demarchi
Corbet Alois Georgeon
Michael Veillet-Guillem
MILM
259
0
0
17 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
520
11
0
17 Mar 2025
Previous
123456...394041
Next