ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,017 papers shown
Title
Improving Token-Based World Models with Parallel Observation Prediction
Improving Token-Based World Models with Parallel Observation Prediction
Lior Cohen
Kaixin Wang
Bingyi Kang
Shie Mannor
266
10
0
08 Feb 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an
  Efficient Context Memory
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao
Pengle Zhang
Xu Han
Guangxuan Xiao
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Maosong Sun
LLMAG
294
101
0
07 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
185
81
0
06 Feb 2024
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
Tao Yuan
Xuefei Ning
Dong Zhou
Zhijie Yang
Shiyao Li
...
Dahua Lin
Boxun Li
Guohao Dai
Shengen Yan
Yu Wang
ALM
265
55
0
06 Feb 2024
UniMem: Towards a Unified View of Long-Context Large Language Models
UniMem: Towards a Unified View of Long-Context Large Language Models
Junjie Fang
Likai Tang
Hongzhe Bi
Yujia Qin
Si Sun
...
Xiaodong Shi
Sen Song
Yankai Lin
Zhiyuan Liu
Maosong Sun
153
3
0
05 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
378
60
0
05 Feb 2024
Learning from Teaching Regularization: Generalizable Correlations Should
  be Easy to Imitate
Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate
Can Jin
Tong Che
Hongwu Peng
Yiyuan Li
Dimitris N. Metaxas
Marco Pavone
296
55
0
05 Feb 2024
Sequence Shortening for Context-Aware Machine Translation
Sequence Shortening for Context-Aware Machine Translation
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
122
3
0
02 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
422
2
0
02 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
438
3
0
01 Feb 2024
Evaluating Large Language Models for Generalization and Robustness via
  Data Compression
Evaluating Large Language Models for Generalization and Robustness via Data Compression
Yucheng Li
Yunhao Guo
Frank Guerin
Chenghua Lin
ELM
150
10
0
01 Feb 2024
Positional Encoding Helps Recurrent Neural Networks Handle a Large
  Vocabulary
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary
Takashi Morita
392
7
0
31 Jan 2024
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Parth Sarthi
Salman Abdullah
Aditi Tuli
Shubh Khanna
Anna Goldie
Christopher D. Manning
RALM
321
264
0
31 Jan 2024
Fine-tuning Transformer-based Encoder for Turkish Language Understanding
  Tasks
Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks
Savas Yildirim
93
14
0
30 Jan 2024
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in
  BERT pretraining
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretrainingInternational Conference on Neural Information Processing (ICONIP), 2024
Wen-Chieh Liang
Youzhi Liang
OffRL
110
2
0
29 Jan 2024
Locality enhanced dynamic biasing and sampling strategies for contextual
  ASR
Locality enhanced dynamic biasing and sampling strategies for contextual ASRAutomatic Speech Recognition & Understanding (ASRU), 2023
Md. Asif Jalal
Pablo Peso Parada
George Pavlidis
Vasileios Moschopoulos
Karthikeyan P. Saravanan
...
Jisi Zhang
Anastasios Drosou
Gil Ho Lee
Jungin Lee
Seokyeong Jung
180
4
0
23 Jan 2024
MoodLoopGP: Generating Emotion-Conditioned Loop Tablature Music with
  Multi-Granular Features
MoodLoopGP: Generating Emotion-Conditioned Loop Tablature Music with Multi-Granular Features
Wenqian Cui
Pedro Sarmento
Mathieu Barthet
207
4
0
23 Jan 2024
Freely Long-Thinking Transformer (FraiLT)
Freely Long-Thinking Transformer (FraiLT)
Akbay Tabak
86
0
0
21 Jan 2024
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
Wang Chao
Jiaxuan Zhao
Licheng Jiao
Lingling Li
Fang Liu
Shuyuan Yang
351
18
0
19 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large
  Language Models -- A Detailed Survey
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Vasu Sharma
Amitava Das
144
39
0
15 Jan 2024
Stabilizing Sharpness-aware Minimization Through A Simple
  Renormalization Strategy
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Chengli Tan
Jiangshe Zhang
Junmin Liu
Yicheng Wang
Yunda Hao
AAML
263
5
0
14 Jan 2024
Learning Cognitive Maps from Transformer Representations for Efficient
  Planning in Partially Observed Environments
Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed EnvironmentsInternational Conference on Machine Learning (ICML), 2024
Antoine Dedieu
Wolfgang Lehrach
Guangyao Zhou
Dileep George
Miguel Lazaro-Gredilla
156
6
0
11 Jan 2024
CrisisKAN: Knowledge-infused and Explainable Multimodal Attention
  Network for Crisis Event Classification
CrisisKAN: Knowledge-infused and Explainable Multimodal Attention Network for Crisis Event ClassificationEuropean Conference on Information Retrieval (ECIR), 2024
Shubham Gupta
Nandini Saini
Suman Kundu
Debasis Das
162
10
0
11 Jan 2024
Hierarchical Knowledge Distillation on Text Graph for Data-limited
  Attribute Inference
Hierarchical Knowledge Distillation on Text Graph for Data-limited Attribute Inference
Quan Li
Shixiong Jing
Lingwei Chen
144
0
0
10 Jan 2024
Attendre: Wait To Attend By Retrieval With Evicted Queries in
  Memory-Based Transformers for Long Context Processing
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing
Zi Yang
Nan Hua
RALM
175
4
0
10 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
173
21
0
07 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRMALM
317
582
0
05 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
366
120
0
04 Jan 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM Maybe LongLM: Self-Extend LLM Context Window Without TuningInternational Conference on Machine Learning (ICML), 2024
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Zirui Liu
Chia-Yuan Chang
Huiyuan Chen
Helen Zhou
365
148
0
02 Jan 2024
Hyperspectral Image Denoising via Spatial-Spectral Recurrent Transformer
Hyperspectral Image Denoising via Spatial-Spectral Recurrent TransformerIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Guanyiman Fu
Fengchao Xiong
Jianfeng Lu
Jun Zhou
Jiantao Zhou
Yuntao Qian
ViT
205
22
0
31 Dec 2023
Non-Vacuous Generalization Bounds for Large Language Models
Non-Vacuous Generalization Bounds for Large Language Models
Sanae Lotfi
Marc Finzi
Yilun Kuang
Tim G. J. Rudner
Micah Goldblum
Andrew Gordon Wilson
429
37
0
28 Dec 2023
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming
  Transformer
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
Chih-Cheng Chang
Li Su
ViT
206
5
0
28 Dec 2023
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
233
23
0
27 Dec 2023
Enhancing User Intent Capture in Session-Based Recommendation with
  Attribute Patterns
Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns
Xin Liu
Zheng Li
Yifan Gao
Jingfeng Yang
Tianyu Cao
Zhengyang Wang
Bing Yin
Yangqiu Song
AI4TS
231
23
0
23 Dec 2023
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLMDiffM
218
11
0
22 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory
  Cache
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Liang Feng
Ping Luo
150
4
0
20 Dec 2023
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Conformer-Based Speech Recognition On Extreme Edge-Computing DevicesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Mingbin Xu
Alex Jin
Sicheng Wang
Mu Su
Tim Ng
...
Shiyi Han
Zhihong Lei
Yaqiao Deng
Zhen Huang
Mahesh Krishnamoorthy
177
8
0
16 Dec 2023
Extending Context Window of Large Language Models via Semantic
  Compression
Extending Context Window of Large Language Models via Semantic CompressionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Weizhi Fei
Xueyan Niu
Pingyi Zhou
Lu Hou
Bo Bai
Lei Deng
Wei Han
164
39
0
15 Dec 2023
Weight subcloning: direct initialization of transformers using larger
  pretrained ones
Weight subcloning: direct initialization of transformers using larger pretrained ones
Mohammad Samragh
Mehrdad Farajtabar
Sachin Mehta
Raviteja Vemulapalli
Fartash Faghri
Devang Naik
Oncel Tuzel
Mohammad Rastegari
273
32
0
14 Dec 2023
Learning Long Sequences in Spiking Neural Networks
Learning Long Sequences in Spiking Neural NetworksScientific Reports (Sci Rep), 2023
Matei Ioan Stan
Oliver Rhodes
148
25
0
14 Dec 2023
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
117
2
0
14 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global
  Attention
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
190
7
0
14 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
377
13
0
13 Dec 2023
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4IEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2023
Pei Yan
Shunquan Tan
Miaohui Wang
Jiwu Huang
153
9
0
13 Dec 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
SwitchHead: Accelerating Transformers with Mixture-of-Experts AttentionNeural Information Processing Systems (NeurIPS), 2023
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
159
27
0
13 Dec 2023
Foundation Models in Robotics: Applications, Challenges, and the Future
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&RoAI4CELRMVLM
208
258
0
13 Dec 2023
VILA: On Pre-training for Visual Language Models
VILA: On Pre-training for Visual Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Ji Lin
Hongxu Yin
Ming-Yu Liu
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
Mohammad Shoeybi
Song Han
MLLMVLM
475
644
0
12 Dec 2023
Why "classic" Transformers are shallow and how to make them go deep
Why "classic" Transformers are shallow and how to make them go deep
Yueyao Yu
Yin Zhang
ViT
230
0
0
11 Dec 2023
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
  Sequence Processing
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Aleksandar Terzić
Michael Hersche
G. Karunaratne
Zixiao Huang
Abu Sebastian
Abbas Rahimi
AI4TS
147
1
0
09 Dec 2023
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
Xiaoyun Xu
Shujian Yu
Jingzheng Wu
S. Picek
AAML
410
6
0
08 Dec 2023
Previous
123...91011...394041
Next