ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,017 papers shown
Title
Needle in the Haystack for Memory Based Large Language Models
Needle in the Haystack for Memory Based Large Language Models
Elliot Nelson
Georgios Kollias
Payel Das
Subhajit Chaudhury
Soham Dan
KELMRALM
263
24
0
01 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
171
32
0
01 Jul 2024
Building Understandable Messaging for Policy and Evidence Review
  (BUMPER) with AI
Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI
Katherine A. Rosenfeld
Maike Sonnewald
Sonia J. Jindal
Kevin A. McCarthy
Joshua L. Proctor
160
1
0
27 Jun 2024
Unveiling and Controlling Anomalous Attention Distribution in
  Transformers
Unveiling and Controlling Anomalous Attention Distribution in Transformers
Ruiqing Yan
Xingbo Du
Haoyu Deng
Linghan Zheng
Qiuzhuang Sun
Jifang Hu
Yuhang Shao
Penghao Jiang
Jinrong Jiang
Lian Zhao
145
1
0
26 Jun 2024
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented
  Analysis Generation
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Abe Bohan Hou
Orion Weller
Guanghui Qin
Eugene Yang
Dawn J Lawrie
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
AILawELM
277
18
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
174
45
0
24 Jun 2024
SimSMoE: Solving Representational Collapse via Similarity Measure
SimSMoE: Solving Representational Collapse via Similarity Measure
Giang Do
Hung Le
T. Tran
MoE
236
3
0
22 Jun 2024
GraphReader: Building Graph-based Agent to Enhance Long-Context
  Abilities of Large Language Models
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Shilong Li
Yancheng He
Hangyu Guo
Xingyuan Bu
Ge Bai
...
Xingwei Qu
Yangguang Li
Wanli Ouyang
Yuchi Xu
Bo Zheng
RALMLLMAG
205
29
0
20 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
151
16
0
19 Jun 2024
Elliptical Attention
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
247
5
0
19 Jun 2024
In-Context Former: Lightning-fast Compressing Context for Large Language
  Model
In-Context Former: Lightning-fast Compressing Context for Large Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiangfeng Wang
Zaiyi Chen
Zheyong Xie
Tong Xu
Yongyi He
Enhong Chen
146
9
0
19 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLMVLM
279
47
0
18 Jun 2024
CItruS: Chunked Instruction-aware State Eviction for Long Sequence
  Modeling
CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling
Yu Bai
Xiyuan Zou
Heyan Huang
Sanxing Chen
Marc-Antoine Rondeau
Yang Gao
Jackie Chi Kit Cheung
191
7
0
17 Jun 2024
Taking a Deep Breath: Enhancing Language Modeling of Large Language
  Models with Sentinel Tokens
Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens
Weiyao Luo
Suncong Zheng
Heming Xia
Weikang Wang
Yan Lei
Tianyu Liu
Shuang Chen
Zhifang Sui
138
2
0
16 Jun 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
429
9
0
15 Jun 2024
Hierarchical Compression of Text-Rich Graphs via Large Language Models
Hierarchical Compression of Text-Rich Graphs via Large Language Models
Shichang Zhang
Da Zheng
Jiani Zhang
Qi Zhu
Xiang Song
Soji Adeshina
Christos Faloutsos
George Karypis
Yizhou Sun
VLM
232
2
0
13 Jun 2024
SPAN: Unlocking Pyramid Representations for Gigapixel Histopathological Images
SPAN: Unlocking Pyramid Representations for Gigapixel Histopathological Images
Weiyi Wu
Xingjian Diao
Chongyang Gao
Xinwen Xu
Siting Li
Jiang Gui
204
0
0
13 Jun 2024
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
Jingyao Li
Han Shi
Xin Jiang
Zhenguo Li
Hong Xu
Jiaya Jia
LRM
155
4
0
11 Jun 2024
Simple and Effective Masked Diffusion Language Models
Simple and Effective Masked Diffusion Language Models
Subham Sekhar Sahoo
Marianne Arriola
Yair Schiff
Aaron Gokaslan
Edgar Marroquin
Justin T Chiu
Alexander M. Rush
Volodymyr Kuleshov
DiffM
188
322
0
11 Jun 2024
Visual Representation Learning with Stochastic Frame Prediction
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang
Dongyoung Kim
Junsu Kim
Jinwoo Shin
Pieter Abbeel
Younggyo Seo
301
7
0
11 Jun 2024
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang
Zhengrui Ma
Yan Zhou
Min Zhang
Yang Feng
219
3
0
11 Jun 2024
LoCoCo: Dropping In Convolutions for Long Context Compression
LoCoCo: Dropping In Convolutions for Long Context CompressionInternational Conference on Machine Learning (ICML), 2024
Ruisi Cai
Yuandong Tian
Zhangyang Wang
Beidi Chen
152
14
0
08 Jun 2024
FragRel: Exploiting Fragment-level Relations in the External Memory of
  Large Language Models
FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models
Xihang Yue
Linchao Zhu
Yi Yang
KELM
179
0
0
05 Jun 2024
Exact Conversion of In-Context Learning to Model Weights in
  Linearized-Attention Transformers
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Brian K Chen
Tianyang Hu
Hui Jin
Hwee Kuan Lee
Kenji Kawaguchi
171
4
0
05 Jun 2024
Extended Mind Transformers
Extended Mind Transformers
Phoebe Klett
Thomas Ahle
RALM
74
0
0
04 Jun 2024
Learning to Play Atari in a World of Tokens
Learning to Play Atari in a World of Tokens
Pranav Agarwal
Sheldon Andrews
Samira Ebrahimi Kahou
OffRL
209
5
0
03 Jun 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via
  Adaptive Heads Fusion
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Yilong Chen
Linhao Zhang
Junyuan Shang
Ying Tai
Tingwen Liu
Shuohuan Wang
Yu Sun
152
5
0
03 Jun 2024
On the Nonlinearity of Layer Normalization
On the Nonlinearity of Layer Normalization
Yunhao Ni
Yuxin Guo
Junlong Jia
Lei Huang
276
7
0
03 Jun 2024
Attention-based Iterative Decomposition for Tensor Product
  Representation
Attention-based Iterative Decomposition for Tensor Product Representation
Taewon Park
Inchul Choi
Minho Lee
185
1
0
03 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
259
8
0
31 May 2024
Improving Generalization and Convergence by Enhancing Implicit
  Regularization
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
213
10
0
31 May 2024
Would I Lie To You? Inference Time Alignment of Language Models using
  Direct Preference Heads
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
109
3
0
30 May 2024
Fourier Controller Networks for Real-Time Decision-Making in Embodied
  Learning
Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
Hengkai Tan
Songming Liu
Kai Ma
Chengyang Ying
Xingxing Zhang
Hang Su
Jun Zhu
261
3
0
30 May 2024
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Feng Chen
Zhen Yang
Bohan Zhuang
Qi Wu
DiffM
169
8
0
30 May 2024
X-VILA: Cross-Modality Alignment for Large Language Model
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye
De-An Huang
Yao Lu
Zhiding Yu
Ming-Yu Liu
...
Jan Kautz
Song Han
Dan Xu
Pavlo Molchanov
Hongxu Yin
MLLMVLM
236
43
0
29 May 2024
Contextual Position Encoding: Learning to Count What's Important
Contextual Position Encoding: Learning to Count What's Important
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
211
48
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
275
8
0
28 May 2024
Rethinking Transformers in Solving POMDPs
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
330
8
0
27 May 2024
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language
  Model Itself
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself
Jun Gao
Ziqiang Cao
Wenjie Li
236
9
0
27 May 2024
Categorical Flow Matching on Statistical Manifolds
Categorical Flow Matching on Statistical ManifoldsNeural Information Processing Systems (NeurIPS), 2024
Chaoran Cheng
Jiahan Li
Jian-wei Peng
Ge Liu
394
20
0
26 May 2024
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubić
Federico Soldá
Aurelio Sulser
Davide Scaramuzza
LRMBDL
311
15
0
26 May 2024
Transformer-XL for Long Sequence Tasks in Robotic Learning from
  Demonstration
Transformer-XL for Long Sequence Tasks in Robotic Learning from Demonstration
Tianci Gao
136
2
0
24 May 2024
Towards Better Understanding of In-Context Learning Ability from
  In-Context Uncertainty Quantification
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Shang Liu
Zhongze Cai
Guanting Chen
Xiaocheng Li
UQCV
181
2
0
24 May 2024
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Abdullah Nazhat Abdullah
Tarkan Aydin
ViT
232
0
0
24 May 2024
Efficient Point Transformer with Dynamic Token Aggregating for Point
  Cloud Processing
Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing
Dening Lu
Jun Zhou
Kyle
K. Gao
Linlin Xu
Jonathan Li
156
0
0
23 May 2024
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
A Structure-Aware Framework for Learning Device Placements on Computation GraphsNeural Information Processing Systems (NeurIPS), 2024
Shukai Duan
Heng Ping
Nikos Kanakaris
Xiongye Xiao
Panagiotis Kyriakis
...
Guixiang Ma
Mihai Capota
Shahin Nazarian
Theodore L. Willke
Paul Bogdan
245
5
0
23 May 2024
Transformers for Image-Goal Navigation
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
261
2
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
752
152
0
23 May 2024
Multi-Agent Reinforcement Learning with Hierarchical Coordination for
  Emergency Responder Stationing
Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing
Amutheezan Sivagnanam
Ava Pettet
Hunter Lee
Ayan Mukhopadhyay
Abhishek Dubey
Aron Laszka
370
6
0
21 May 2024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
William Brandon
Mayank Mishra
Aniruddha Nrusimha
Yikang Shen
Jonathan Ragan-Kelley
MQ
215
83
0
21 May 2024
Previous
123...678...394041
Next