ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and UnderstandingInterspeech (Interspeech), 2023
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
236
12
0
12 Jul 2023
Pluggable Neural Machine Translation Models via Memory-augmented
  Adapters
Pluggable Neural Machine Translation Models via Memory-augmented AdaptersInternational Conference on Language Resources and Evaluation (LREC), 2023
Yuzhuang Xu
Shuo Wang
Peng Li
Xuebo Liu
Xiaolong Wang
Weidong Liu
Yang Liu
342
1
0
12 Jul 2023
Transformers in Reinforcement Learning: A Survey
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
252
26
0
12 Jul 2023
ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal
  Production
ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production
John H. Loth
Pedro Sarmento
CJ Carr
Zack Zukowski
M. Barthet
158
10
0
11 Jul 2023
ShredGP: Guitarist Style-Conditioned Tablature Generation
ShredGP: Guitarist Style-Conditioned Tablature Generation
Pedro Sarmento
Adarsh Kumar
Dekun Xie
CJ Carr
Zack Zukowski
M. Barthet
183
9
0
11 Jul 2023
Learning to Solve Constraint Satisfaction Problems with Recurrent
  Transformer
Learning to Solve Constraint Satisfaction Problems with Recurrent TransformerInternational Conference on Learning Representations (ICLR), 2023
Zhun Yang
Adam Ishay
Joohyung Lee
341
17
0
10 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long ContextsTransactions of the Association for Computational Linguistics (TACL), 2023
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Abigail Z. Jacobs
RALM
557
2,692
0
06 Jul 2023
Focused Transformer: Contrastive Training for Context Scaling
Focused Transformer: Contrastive Training for Context ScalingNeural Information Processing Systems (NeurIPS), 2023
Szymon Tworkowski
Konrad Staniszewski
Mikolaj Pacek
Yuhuai Wu
Henryk Michalewski
Piotr Milo's
235
165
0
06 Jul 2023
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical
  Attention Bias
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention BiasKnowledge Discovery and Data Mining (KDD), 2023
Mario Almagro
Emilio Almazán
Diego Ortego
David Jiménez
255
5
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
473
221
0
05 Jul 2023
Facing Off World Model Backbones: RNNs, Transformers, and S4
Facing Off World Model Backbones: RNNs, Transformers, and S4Neural Information Processing Systems (NeurIPS), 2023
Fei Deng
Junyeong Park
Sungjin Ahn
340
43
0
05 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload
  Optimization
Improving Automatic Parallel Training via Balanced Memory Workload OptimizationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Tengjiao Wang
313
21
0
05 Jul 2023
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang
Hiromitsu Nishizaki
Ming Li
215
1
0
04 Jul 2023
Implicit Memory Transformer for Computationally Efficient Simultaneous
  Speech Translation
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Matthew Raffel
Lizhong Chen
158
5
0
03 Jul 2023
Shiftable Context: Addressing Training-Inference Context Mismatch in
  Simultaneous Speech Translation
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech TranslationInternational Conference on Machine Learning (ICML), 2023
Matthew Raffel
Drew Penney
Lizhong Chen
162
4
0
03 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph ReadingInterspeech (Interspeech), 2023
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
183
9
0
03 Jul 2023
MeLM, a generative pretrained language modeling framework that solves
  forward and inverse mechanics problems
MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problemsJournal of the mechanics and physics of solids (JMPS), 2023
Markus J. Buehler
AI4CE
223
59
0
30 Jun 2023
Knowledge Base Completion for Long-Tail Entities
Knowledge Base Completion for Long-Tail Entities
Lihu Chen
Simon Razniewski
Gerhard Weikum
KELM
223
9
0
30 Jun 2023
Leveraging Cross-Utterance Context For ASR Decoding
Leveraging Cross-Utterance Context For ASR DecodingInterspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
194
1
0
29 Jun 2023
Prompting Large Language Models for Zero-Shot Domain Adaptation in
  Speech Recognition
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
256
62
0
28 Jun 2023
Most Language Models can be Poets too: An AI Writing Assistant and
  Constrained Text Generation Studio
Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation StudioConference on Algebraic Informatics (CAI), 2023
Allen Roush
Sanjay Basu
Akshay Moorthy
Dmitry Dubovoy
123
12
0
28 Jun 2023
Extending Context Window of Large Language Models via Positional
  Interpolation
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
436
684
0
27 Jun 2023
Length Generalization in Arithmetic Transformers
Length Generalization in Arithmetic Transformers
Samy Jelassi
Stéphane dÁscoli
Carles Domingo-Enrich
Yuhuai Wu
Yuan-Fang Li
Franccois Charton
258
52
0
27 Jun 2023
MotionGPT: Human Motion as a Foreign Language
MotionGPT: Human Motion as a Foreign LanguageNeural Information Processing Systems (NeurIPS), 2023
Biao Jiang
Xin Chen
Wen Liu
Jingyi Yu
Gang Yu
Tao Chen
MLLM
292
450
0
26 Jun 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
442
11
0
26 Jun 2023
Swin-Free: Achieving Better Cross-Window Attention and Efficiency with
  Size-varying Window
Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window
Jinkyu Koo
John Yang
Le An
Gwenaelle Cunha Sergio
Su Inn Park
ViT
106
0
0
23 Jun 2023
Efficient Online Processing with Deep Neural Networks
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
209
0
0
23 Jun 2023
Long-range Language Modeling with Self-retrieval
Long-range Language Modeling with Self-retrievalTransactions of the Association for Computational Linguistics (TACL), 2023
Ohad Rubin
Jonathan Berant
RALMKELM
229
32
0
23 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer
  Transducer Speech Recognition Systems
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition SystemsInterspeech (Interspeech), 2023
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
174
8
0
23 Jun 2023
Don't be so Monotone: Relaxing Stochastic Line Search in
  Over-Parameterized Models
Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized ModelsNeural Information Processing Systems (NeurIPS), 2023
Leonardo Galli
Holger Rauhut
Mark Schmidt
211
17
0
22 Jun 2023
Exploring the Role of Audio in Video Captioning
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
169
6
0
21 Jun 2023
Sparse Modular Activation for Efficient Sequence Modeling
Sparse Modular Activation for Efficient Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
278
17
0
19 Jun 2023
NAR-Former V2: Rethinking Transformer for Universal Neural Network
  Representation Learning
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation LearningNeural Information Processing Systems (NeurIPS), 2023
Yun Yi
Haokui Zhang
Rong Xiao
Nan Wang
Xiaoyu Wang
GNN
322
6
0
19 Jun 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
413
281
0
16 Jun 2023
Pushing the Limits of ChatGPT on NLP Tasks
Pushing the Limits of ChatGPT on NLP Tasks
Xiaofei Sun
Linfeng Dong
Xiaoya Li
Zhen Wan
Shuhe Wang
...
Jiwei Li
Fei Cheng
Lingjuan Lyu
Leilei Gan
Guoyin Wang
AI4MHLRM
294
37
0
16 Jun 2023
TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling
TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling
Ke Deng
Zhiyuan He
Haotong Zhang
Hao-Wei Lin
Desheng Wang
75
0
0
16 Jun 2023
Block-State Transformers
Block-State TransformersNeural Information Processing Systems (NeurIPS), 2023
Mahan Fathi
Jonathan Pilault
Orhan Firat
C. Pal
Pierre-Luc Bacon
Ross Goroshin
253
25
0
15 Jun 2023
Recurrent Action Transformer with Memory
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
397
13
0
15 Jun 2023
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Semantic HELM: A Human-Readable Memory for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Fabian Paischer
Thomas Adler
M. Hofmarcher
Sepp Hochreiter
300
18
0
15 Jun 2023
Iterative self-transfer learning: A general methodology for response
  time-history prediction based on small dataset
Iterative self-transfer learning: A general methodology for response time-history prediction based on small datasetJournal of Computational Design and Engineering (JCDE), 2023
Yongjia Xu
Xinzheng Lu
Yifan Fei
Yuli Huang
AI4TSAI4CE
166
19
0
14 Jun 2023
Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time
  Series
Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time SeriesKnowledge Discovery and Data Mining (KDD), 2023
Jiawen Zhang
Shun Zheng
Wei Cao
Jiang Bian
Jia Li
AI4TS
158
47
0
14 Jun 2023
Research on an improved Conformer end-to-end Speech Recognition Model
  with R-Drop Structure
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
189
1
0
14 Jun 2023
Augmenting Language Models with Long-Term Memory
Augmenting Language Models with Long-Term MemoryNeural Information Processing Systems (NeurIPS), 2023
Weizhi Wang
Li Dong
Hao Cheng
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
KELMRALM
241
142
0
12 Jun 2023
Recurrent Attention Networks for Long-text Modeling
Recurrent Attention Networks for Long-text ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xianming Li
Zongxi Li
Xiaotian Luo
Haoran Xie
Xing Lee
Yingbin Zhao
Fu Lee Wang
Qing Li
RALM
217
21
0
12 Jun 2023
A Comprehensive Review of State-of-The-Art Methods for Java Code
  Generation from Natural Language Text
A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language TextNatural Language Processing Journal (JNLP), 2023
Jessica Nayeli López Espejel
Mahaman Sanoussi Yahaya Alassan
El Mehdi Chouham
Walid Dahhane
E. Ettifouri
269
15
0
10 Jun 2023
FlowFormer: A Transformer Architecture and Its Masked Cost Volume
  Autoencoding for Optical Flow
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Zhaoyang Huang
Xiaoyu Shi
Chao Zhang
Qiang Wang
Yijin Li
Hongwei Qin
Jifeng Dai
Xiaogang Wang
Jiaming Song
338
4
0
08 Jun 2023
ModuleFormer: Modularity Emerges from Mixture-of-Experts
ModuleFormer: Modularity Emerges from Mixture-of-Experts
Songlin Yang
Zheyu Zhang
Tianyou Cao
Shawn Tan
Zhenfang Chen
Chuang Gan
KELMMoE
202
13
0
07 Jun 2023
Query Encoder Distillation via Embedding Alignment is a Strong Baseline
  Method to Boost Dense Retriever Online Efficiency
Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Yuxuan Wang
Hong Lyu
147
4
0
05 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles
The Information Pathways Hypothesis: Transformers are Dynamic Self-EnsemblesKnowledge Discovery and Data Mining (KDD), 2023
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
325
4
0
02 Jun 2023
Data-Efficient French Language Modeling with CamemBERTa
Data-Efficient French Language Modeling with CamemBERTaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Wissam Antoun
Benoît Sagot
Djamé Seddah
152
9
0
02 Jun 2023
Previous
123...131415...394041
Next
Page 14 of 41
Pageof 41