Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,021 papers shown
Title
Large Language Models Imitate Logical Reasoning, but at what Cost?
Lachlan McGinness
Peter Baumgartner
ReLM
LRM
ELM
AI4CE
192
2
0
16 Sep 2025
Positional Encoding via Token-Aware Phase Attention
Wang
Sheng Shen
Rémi Munos
Hongyuan Zhan
Yuandong Tian
174
0
0
16 Sep 2025
TFANet: Three-Stage Image-Text Feature Alignment Network for Robust Referring Image Segmentation
Qianqi Lu
Yuxiang Xie
Jing Zhang
Shiwei Zou
Yan Chen
Xidao Luan
130
0
0
16 Sep 2025
Reversible Deep Equilibrium Models
Sam McCallum
Kamran Arora
James Foster
199
2
0
16 Sep 2025
Context-Aware Language Models for Forecasting Market Impact from Sequences of Financial News
Ross Koval
Nicholas Andrews
Xifeng Yan
AIFin
136
0
0
15 Sep 2025
FinGEAR: Financial Mapping-Guided Enhanced Answer Retrieval
Ying Li
Mengyu Wang
Miguel de Carvalho
Sotirios Sabanis
Tiejun Ma
104
1
0
15 Sep 2025
OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft
Zihao Wang
Muyao Li
K. He
Xiangyu Wang
Zhancun Mu
Anji Liu
Yitao Liang
LM&Ro
158
2
0
13 Sep 2025
Long Context Automated Essay Scoring with Language Models
Christopher Ormerod
Gitit Kehat
111
0
0
12 Sep 2025
SAC-MIL: Spatial-Aware Correlated Multiple Instance Learning for Histopathology Whole Slide Image Classification
Yu Bai
Zitong Yu
Haowen Tian
X. Wang
Shuo Yan
...
Zheng Zhang
Wufan Wang
Hui Gao
Xiangyang Gong
Wendong Wang
100
0
0
04 Sep 2025
Joint Modeling of Entities and Discourse Relations for Coherence Assessment
Wei Liu
Michael Strube
140
1
0
04 Sep 2025
KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation
Farnoosh Hashemi
Laks V.S. Lakshmanan
76
0
0
04 Sep 2025
DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
Jusheng Zhang
Yijia Fan
Kaitong Cai
Zimeng Huang
Xiaofei Sun
Jian Wang
Chengpei Tang
Keze Wang
DiffM
136
24
0
02 Sep 2025
Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving
Mingyi Wang
Jingke Wang
Tengju Ye
Junbo Chen
Kaicheng Yu
AILaw
164
1
0
02 Sep 2025
Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music
Hongju Su
Ke Li
Lan Yang
Honggang Zhang
Yi-Zhe Song
62
0
0
28 Aug 2025
What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture
Heng-Sheng Chang
P. Mehta
AI4TS
92
0
0
27 Aug 2025
Orchid: Orchestrating Context Across Creative Workflows with Generative AI
Srishti Palani
Gonzalo Ramos
119
0
0
27 Aug 2025
Limitations of Normalization in Attention Mechanism
Timur Mudarisov
Mikhail Burtsev
Tatiana Petrova
Radu State
90
2
0
25 Aug 2025
CoPE: A Lightweight Complex Positional Encoding
Avinash Amballa
43
0
0
23 Aug 2025
Vision encoders should be image size agnostic and task driven
Nedyalko Prisadnikov
Danda Pani Paudel
Yuqian Fu
Luc Van Gool
84
1
0
22 Aug 2025
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
Ivan Rodkin
Daniil Orel
Konstantin Smirnov
Arman Bolatov
Bilal Elbouardi
...
Aydar Bulatov
Preslav Nakov
Timothy Baldwin
Artem Shelmanov
Mikhail Burtsev
LRM
221
0
0
22 Aug 2025
Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test
Rui A. Pimenta
Tim Schlippe
Kristina Schaaff
LRM
120
0
0
22 Aug 2025
AdaptJobRec: Enhancing Conversational Career Recommendation through an LLM-Powered Agentic System
Qixin Wang
Dawei Wang
Kun Chen
Yaowei Hu
Puneet Girdhar
...
Shangwen Huang
Bachir Aoun
Greg Hayworth
Han Li
Xintao Wu
101
0
0
19 Aug 2025
Wavy Transformer
Satoshi Noguchi
Yoshinobu Kawahara
110
0
0
18 Aug 2025
The Yokai Learning Environment: Tracking Beliefs Over Space and Time
Constantin Ruhdorfer
Matteo Bortoletto
Andreas Bulling
164
1
0
17 Aug 2025
Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures
Parsa Omidi
Xingshuai Huang
Axel Laborieux
Bahareh Nikpour
Tianyu Shi
A. Eshaghi
152
4
0
14 Aug 2025
Advances in Speech Separation: Techniques, Challenges, and Future Trends
Kai Li
Guo Chen
Wendi Sang
Yi Luo
Zhuo Chen
...
Shulin He
Zhong-Qiu Wang
Andong Li
Z. Wu
Xiaolin Hu
AI4TS
104
4
0
14 Aug 2025
FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model
Yufei Ye
Wei Guo
Hao Wang
Hong Zhu
Yuyang Ye
Yong Liu
Huifeng Guo
Ruiming Tang
Defu Lian
Tong Xu
118
2
0
14 Aug 2025
A Survey on Diffusion Language Models
Tianyi Li
Mingda Chen
Bowei Guo
Zhiqiang Shen
269
28
0
14 Aug 2025
Fast weight programming and linear transformers: from machine learning to neurobiology
Kazuki Irie
Samuel J. Gershman
132
0
0
11 Aug 2025
ADT4Coupons: An Innovative Framework for Sequential Coupon Distribution in E-commerce
Li Kong
Bingzhe Wang
Zhou Chen
Suhan Hu
Yuchao Ma
Qi Qi
Suoyuan Song
Bicheng Jin
OffRL
94
0
0
08 Aug 2025
A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature
Faruk Alpay
Bugra Kilictas
Hamdi Alakkad
52
0
0
06 Aug 2025
AttZoom: Attention Zoom for Better Visual Features
Daniel DeAlcala
Aythami Morales
Julian Fierrez
Ruben Tolosana
162
1
0
05 Aug 2025
Trainable Dynamic Mask Sparse Attention
Jingze Shi
Yifan Wu
Yiran Peng
Yiran Peng
Liangdong Wang
Guang Liu
Yuyu Luo
312
2
0
04 Aug 2025
SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy
RJ Skerry-Ryan
Julián Salazar
Soroosh Mariooryad
David Kao
Daisy Stanton
...
Matt Shannon
Ron J. Weiss
Robin Scheibler
Jonas Rothfuss
Tom Bagby
AI4TS
65
0
0
31 Jul 2025
Goal-Based Vision-Language Driving
Santosh Patapati
Trisanth Srinivasan
145
0
0
30 Jul 2025
Exploring the Stratified Space Structure of an RL Game with the Volume Growth Transform
Justin Curry
Brennan Lagasse
Ngoc B. Lam
Gregory Cox
David Rosenbluth
Alberto Speranzon
OffRL
176
0
0
29 Jul 2025
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu
Yaoming Wang
Bowen Shi
Xiaopeng Zhang
Wenrui Dai
Chenglin Li
Hongkai Xiong
Qi Tian
140
1
0
28 Jul 2025
Flora: Effortless Context Construction to Arbitrary Length and Scale
Tianxiang Chen
Zhentao Tan
Xiaofan Bo
Yue Wu
Tao Gong
Qi Chu
Jieping Ye
Nenghai Yu
CLL
LRM
227
1
0
26 Jul 2025
Morphlux: Transforming Torus Fabrics for Efficient Multi-tenant ML
Abhishek Vijaya Kumar
Eric Ding
Arjun Devraj
Darius Bunandar
Rachee Singh
124
0
0
20 Jul 2025
Evaluation of Coding Schemes for Transformer-based Gene Sequence Modeling
Chenlei Gong
Yuanhe Tian
Lei Mao
Yan Song
69
1
0
20 Jul 2025
Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers
Harsh Nilesh Pathak
Randy Paffenroth
153
1
0
18 Jul 2025
White-Basilisk: A Hybrid Model for Code Vulnerability Detection
Ioannis Lamprou
Alexander Shevtsov
Ioannis Arapakis
Sotiris Ioannidis
243
0
0
11 Jul 2025
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiuying Wei
Anunay Yadav
Razvan Pascanu
Çağlar Gülçehre
AI4TS
229
0
0
06 Jul 2025
Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity
Oluwadamilola Fasina
Ruben V.C. Pohle
Pei-Chun Su
Ronald R. Coifman
133
0
0
18 Jun 2025
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
Anna Hamberger
Sebastian Murgul
Jochen Schmidt
Michael Heizmann
163
2
0
17 Jun 2025
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Luanbo Wan
Weizhi Ma
LLMAG
KELM
203
1
0
16 Jun 2025
JoFormer (Journey-based Transformer): Theory and Empirical Analysis on the Tiny Shakespeare Dataset
Mahesh Godavarti
109
0
0
10 Jun 2025
Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping
Nitin Sharma
Thomas Wolfers
Çağatay Yıldız
ALM
149
0
0
09 Jun 2025
FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting
Yash Vijay
Harini Subramanyan
AI4TS
144
0
0
05 Jun 2025
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
Woomin Song
Sai Muralidhar Jayanthi
S. Ronanki
Kanthashree Mysore Sathyendra
Jinwoo Shin
Aram Galstyan
Shubham Katiyar
S. Bodapati
VLM
331
0
0
01 Jun 2025
Previous
1
2
3
4
5
...
39
40
41
Next