ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
Applying Plain Transformers to Real-World Point Clouds
Applying Plain Transformers to Real-World Point Clouds
Lanxiao Li
M. Heizmann
3DPCViT
373
3
0
28 Feb 2023
Sampled Transformer for Point Sets
Sampled Transformer for Point Sets
Shidi Li
Christian J. Walder
Alexander Soen
Lexing Xie
Miaomiao Liu
3DPC
180
1
0
28 Feb 2023
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and
  English
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and EnglishIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiaoming Ren
Chao Li
Shenjian Wang
Biao Li
137
0
0
28 Feb 2023
A Survey on Long Text Modeling with Transformers
A Survey on Long Text Modeling with Transformers
Zican Dong
Tianyi Tang
Lunyi Li
Wayne Xin Zhao
VLM
404
69
0
28 Feb 2023
Diagonal State Space Augmented Transformers for Speech Recognition
Diagonal State Space Augmented Transformers for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
G. Saon
Ankit Gupta
Xiaodong Cui
AI4TS
151
38
0
27 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
519
326
0
27 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
291
151
0
27 Feb 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
8.7K
18,046
0
27 Feb 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural
  Networks
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Nhan Duy Truong
BDLVLM
469
119
0
27 Feb 2023
An algorithmic framework for the optimization of deep neural networks
  architectures and hyperparameters
An algorithmic framework for the optimization of deep neural networks architectures and hyperparametersJournal of machine learning research (JMLR), 2023
Julie Keisler
El-Ghazali Talbi
Sandra Claudel
Gilles Cabriel
259
9
0
27 Feb 2023
Modelling Temporal Document Sequences for Clinical ICD Coding
Modelling Temporal Document Sequences for Clinical ICD CodingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Clarence Boon Liang Ng
Diogo Santos
Marek Rei
185
11
0
24 Feb 2023
Deep Transformers without Shortcuts: Modifying Self-attention for
  Faithful Signal Propagation
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal PropagationInternational Conference on Learning Representations (ICLR), 2023
Bobby He
James Martens
Guodong Zhang
Aleksandar Botev
Andy Brock
Samuel L. Smith
Yee Whye Teh
235
41
0
20 Feb 2023
Neural Attention Memory
Neural Attention Memory
Hyoungwook Nam
S. Seo
HAI
152
2
0
18 Feb 2023
MorphGANFormer: Transformer-based Face Morphing and De-Morphing
MorphGANFormer: Transformer-based Face Morphing and De-Morphing
Naifeng Zhang
Xudong Liu
Xuzhao Li
Guo-Jun Qi
CVBM
165
8
0
18 Feb 2023
Enhancing Multivariate Time Series Classifiers through Self-Attention
  and Relative Positioning Infusion
Enhancing Multivariate Time Series Classifiers through Self-Attention and Relative Positioning InfusionIEEE Access (IEEE Access), 2023
Mehryar Abbasi
Parvaneh Saeedi
AI4TS
239
10
0
13 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence ModelingInternational Conference on Machine Learning (ICML), 2023
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
207
66
0
13 Feb 2023
A Study on ReLU and Softmax in Transformer
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
235
72
0
13 Feb 2023
Transformer models: an introduction and catalog
Transformer models: an introduction and catalog
X. Amatriain
Ananth Sankar
Jie Bing
Praveen Kumar Bodigutla
Timothy J. Hazen
Michaeel Kazi
503
73
0
12 Feb 2023
GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music
  Generation with Transformers
GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers
Pedro Sarmento
Adarsh Kumar
Yu-Hua Chen
CJ Carr
Zack Zukowski
M. Barthet
212
24
0
10 Feb 2023
Cut your Losses with Squentropy
Cut your Losses with SquentropyInternational Conference on Machine Learning (ICML), 2023
Like Hui
M. Belkin
S. Wright
UQCV
135
9
0
08 Feb 2023
EvoText: Enhancing Natural Language Generation Models via
  Self-Escalation Learning for Up-to-Date Knowledge and Improved Performance
EvoText: Enhancing Natural Language Generation Models via Self-Escalation Learning for Up-to-Date Knowledge and Improved PerformanceApplied Sciences (Appl. Sci.), 2023
Zheng Yuan
HU Xue
Chuxu Zhang
Yongming Liu
VLM
257
1
0
08 Feb 2023
Transformer-based Models for Long-Form Document Matching: Challenges and
  Empirical Analysis
Transformer-based Models for Long-Form Document Matching: Challenges and Empirical AnalysisFindings (Findings), 2023
Akshita Jha
Adithya Samavedhi
Vineeth Rakesh
J. Chandrashekar
Chandan K. Reddy
158
1
0
07 Feb 2023
Memory-Based Meta-Learning on Non-Stationary Distributions
Memory-Based Meta-Learning on Non-Stationary DistributionsInternational Conference on Machine Learning (ICML), 2023
Tim Genewein
Grégoire Delétang
Anian Ruoss
L. Wenliang
Elliot Catt
Vincent Dutordoir
Jordi Grau-Moya
Laurent Orseau
Marcus Hutter
J. Veness
BDL
257
15
0
06 Feb 2023
Computation vs. Communication Scaling for Future Transformers on Future
  Hardware
Computation vs. Communication Scaling for Future Transformers on Future Hardware
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
267
14
0
06 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient
  approaches along the Deep Learning Lifecycle
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAIMedIm
245
26
0
05 Feb 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in
  Transformers
Learning a Fourier Transform for Linear Relative Positional Encodings in TransformersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
327
10
0
03 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and OutputsInternational Conference on Machine Learning (ICML), 2023
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
448
151
0
31 Jan 2023
An Comparative Analysis of Different Pitch and Metrical Grid Encoding
  Methods in the Task of Sequential Music Generation
An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation
Yuqiang Li
Shengchen Li
Georgy Fazekas
192
2
0
31 Jan 2023
A Comparative Study of Pretrained Language Models for Long Clinical Text
A Comparative Study of Pretrained Language Models for Long Clinical Text
Yikuan Li
R. M. Wehbe
F. Ahmad
Hanyin Wang
Yuan Luo
LM&MAELMVLMMedIm
261
113
0
27 Jan 2023
Robust Transformer with Locality Inductive Bias and Feature
  Normalization
Robust Transformer with Locality Inductive Bias and Feature NormalizationEngineering Science and Technology, an International Journal (JEST), 2023
Omid Nejati Manzari
Hossein Kashiani
Hojat Asgarian Dehkordi
S. B. Shokouhi
ViT
184
20
0
27 Jan 2023
Open Problems in Applied Deep Learning
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
234
3
0
26 Jan 2023
Out of Distribution Performance of State of Art Vision Model
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
402
4
0
25 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Human-Timescale Adaptation in an Open-Ended Task SpaceInternational Conference on Machine Learning (ICML), 2023
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&RoOffRLAI4CELRM
329
148
0
18 Jan 2023
Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
Ankh: Optimized Protein Language Model Unlocks General-Purpose ModellingbioRxiv (bioRxiv), 2023
Ahmed Elnaggar
Hazem Essam
Wafaa Salah-Eldin
Walid Moustafa
Mohamed Elkerdawy
Charlotte Rochereau
B. Rost
424
144
0
16 Jan 2023
Language Cognition and Language Computation -- Human and Machine
  Language Understanding
Language Cognition and Language Computation -- Human and Machine Language Understanding
Shaonan Wang
Nai Ding
Nan Lin
Jiajun Zhang
Chengqing Zong
250
2
0
12 Jan 2023
WuYun: Exploring hierarchical skeleton-guided melody generation using
  knowledge-enhanced deep learning
WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning
Kejun Zhang
Xinda Wu
Tieyao Zhang
Zhijie Huang
Xu Tan
Qihao Liang
Songruoyao Wu
Lingyun Sun
254
13
0
11 Jan 2023
A Survey on Transformers in Reinforcement Learning
A Survey on Transformers in Reinforcement Learning
Wenzhe Li
Hao Luo
Zichuan Lin
Chongjie Zhang
Zongqing Lu
Deheng Ye
OffRLMUAI4CE
547
72
0
08 Jan 2023
Using External Off-Policy Speech-To-Text Mappings in Contextual
  End-To-End Automated Speech Recognition
Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
David M. Chan
Shalini Ghosh
Ariya Rastrow
Björn Hoffmeister
OffRL
209
7
0
06 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
320
14
0
30 Dec 2022
Transformer in Transformer as Backbone for Deep Reinforcement Learning
Transformer in Transformer as Backbone for Deep Reinforcement Learning
Hangyu Mao
Rui Zhao
Hao Chen
Jianye Hao
Yiqun Chen
Dong Li
Junge Zhang
Zhen Xiao
OffRL
189
9
0
30 Dec 2022
Efficient Movie Scene Detection using State-Space Transformers
Efficient Movie Scene Detection using State-Space TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Md. Mohaiminul Islam
Mahmudul Hasan
Kishan Athrey
Tony Braskich
Gedas Bertasius
ViT
246
69
0
29 Dec 2022
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
211
14
0
29 Dec 2022
On Transforming Reinforcement Learning by Transformer: The Development
  Trajectory
On Transforming Reinforcement Learning by Transformer: The Development TrajectoryIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Shengchao Hu
Li Shen
Ya Zhang
Yixin Chen
Dacheng Tao
OffRL
340
64
0
29 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space ModelsInternational Conference on Learning Representations (ICLR), 2022
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
442
556
0
28 Dec 2022
Part-guided Relational Transformers for Fine-grained Visual Recognition
Part-guided Relational Transformers for Fine-grained Visual RecognitionIEEE Transactions on Image Processing (TIP), 2021
Yifan Zhao
Jia Li
Xiaowu Chen
Yonghong Tian
ViT
210
55
0
28 Dec 2022
On Realization of Intelligent Decision-Making in the Real World: A
  Foundation Decision Model Perspective
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
221
12
0
24 Dec 2022
Scalable Adaptive Computation for Iterative Generation
Scalable Adaptive Computation for Iterative GenerationInternational Conference on Machine Learning (ICML), 2022
Allan Jabri
David Fleet
Ting-Li Chen
DiffM
254
153
0
22 Dec 2022
Generating music with sentiment using Transformer-GANs
Generating music with sentiment using Transformer-GANsInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Pedro Neves
José Fornari
J. Florindo
MGen
145
26
0
21 Dec 2022
ORCA: A Challenging Benchmark for Arabic Language Understanding
ORCA: A Challenging Benchmark for Arabic Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
AbdelRahim Elmadany
El Moatez Billah Nagoudi
Muhammad Abdul-Mageed
ELM
301
60
0
21 Dec 2022
A Length-Extrapolatable Transformer
A Length-Extrapolatable TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yutao Sun
Li Dong
Barun Patra
Shuming Ma
Shaohan Huang
Alon Benhaim
Vishrav Chaudhary
Xia Song
Furu Wei
330
156
0
20 Dec 2022
Previous
123...161718...394041
Next
Page 17 of 41
Pageof 41