ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers
SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers
Hongyi Yuan
Zheng Yuan
Chuanqi Tan
Fei Huang
Songfang Huang
DiffM
252
83
0
20 Dec 2022
Memory-efficient NLLB-200: Language-specific Expert Pruning of a
  Massively Multilingual Machine Translation Model
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yeskendir Koishekenov
Alexandre Berard
Vassilina Nikoulina
MoE
264
43
0
19 Dec 2022
Inductive Attention for Video Action Anticipation
Inductive Attention for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
Oswald Lanz
209
1
0
17 Dec 2022
Speech Aware Dialog System Technology Challenge (DSTC11)
Speech Aware Dialog System Technology Challenge (DSTC11)
H. Soltau
Izhak Shafran
Mingqiu Wang
Abhinav Rastogi
Jeffrey Zhao
Ye Jia
Wei Han
Yuan Cao
Aramys Miranda
194
11
0
16 Dec 2022
Rarely a problem? Language models exhibit inverse scaling in their
  predictions following few-type quantifiers
Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
J. Michaelov
Benjamin Bergen
198
18
0
16 Dec 2022
GeneFormer: Learned Gene Compression using Transformer-based Context
  Modeling
GeneFormer: Learned Gene Compression using Transformer-based Context ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhanbei Cui
Yuze Liao
Tongda Xu
Yan Wang
125
17
0
16 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
332
37
0
15 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw DataInternational Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
309
70
0
12 Dec 2022
Contextual Explainable Video Representation: Human Perception-based
  Understanding
Contextual Explainable Video Representation: Human Perception-based UnderstandingAsilomar Conference on Signals, Systems and Computers (ACSSC), 2022
Khoa T. Vo
Kashu Yamazaki
Phong H. Nguyen
Pha Nguyen
Khoa Luu
Ngan Le
229
11
0
12 Dec 2022
P-Transformer: Towards Better Document-to-Document Neural Machine
  Translation
P-Transformer: Towards Better Document-to-Document Neural Machine TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Yachao Li
Junhui Li
Jing Jiang
Shimin Tao
Hao Yang
Hao Fei
ViT
157
17
0
12 Dec 2022
CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised
  Video Anomaly Detection
CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly DetectionInternational Conference on Information Photonics (ICIP), 2022
Kevin Hyekang Joo
Khoa T. Vo
Kashu Yamazaki
Ngan Le
234
94
0
09 Dec 2022
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar DataIEEE Robotics and Automation Letters (RA-L), 2022
Matthias Zeller
Jens Behley
Michael Heidingsfeld
C. Stachniss
249
32
0
07 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Hierarchical multimodal transformers for Multi-Page DocVQAPattern Recognition (Pattern Recogn.), 2022
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
266
97
0
07 Dec 2022
Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Ethan M. Rudd
Mohammad Saidur Rahman
Philip Tully
212
6
0
05 Dec 2022
Meta-Learning Fast Weight Language Models
Meta-Learning Fast Weight Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
208
15
0
05 Dec 2022
LMEC: Learnable Multiplicative Absolute Position Embedding Based
  Conformer for Speech Recognition
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang
Yu Pan
Jingjing Yin
Heng Lu
252
4
0
05 Dec 2022
NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer
NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer
Changsheng Quan
Xiaofei Li
161
5
0
05 Dec 2022
Language Models as Agent Models
Language Models as Agent ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jacob Andreas
LLMAG
272
169
0
03 Dec 2022
A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention
  Mechanism for Symbolic Music Modeling
A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music ModelingAAAI Conference on Artificial Intelligence (AAAI), 2022
Z. Guo
J. Kang
Dorien Herremans
133
23
0
02 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution TrainingComputer Vision and Pattern Recognition (CVPR), 2022
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
257
51
0
01 Dec 2022
Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images
Meng Wang
Kai-An Yu
Chun-Mei Feng
K. Zou
Yanyu Xu
Qingquan Meng
Rick Siow Mong Goh
Yong Liu
Huazhu Fu
MedIm
232
3
0
01 Dec 2022
Protein Language Models and Structure Prediction: Connection and
  Progression
Protein Language Models and Structure Prediction: Connection and Progression
Bozhen Hu
Jun Xia
Jiangbin Zheng
Cheng Tan
Yufei Huang
Yongjie Xu
Stan Z. Li
220
46
0
30 Nov 2022
Survey on Self-Supervised Multimodal Representation Learning and
  Foundation Models
Survey on Self-Supervised Multimodal Representation Learning and Foundation Models
Sushil Thapa
AI4TSSSL
103
3
0
29 Nov 2022
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video
  Paragraph Captioning
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2022
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
277
44
0
28 Nov 2022
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image
  Understanding
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zilong Wang
Jiuxiang Gu
Chris Tensmeyer
Nikolaos Barmpalios
A. Nenkova
Tong Sun
Jingbo Shang
Vlad I. Morariu
VLM
165
13
0
27 Nov 2022
Deep representation learning: Fundamentals, Perspectives, Applications,
  and Open Challenges
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges
K. T. Baghaei
Amirreza Payandeh
Pooya Fayyazsanavi
Shahram Rahimi
Zhiqian Chen
Somayeh Bakhtiari Ramezani
FaMLAI4TS
226
10
0
27 Nov 2022
A Survey of Text Representation Methods and Their Genealogy
A Survey of Text Representation Methods and Their GenealogyIEEE Access (IEEE Access), 2022
Philipp Siebers
Christian Janiesch
Patrick Zschech
AI4TS
119
11
0
26 Nov 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using
  Automatic Parallelism
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic ParallelismProceedings of the VLDB Endowment (PVLDB), 2022
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Tengjiao Wang
GNNMoE
238
91
0
25 Nov 2022
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank AttentionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Bosheng Qin
Juncheng Li
Siliang Tang
Yueting Zhuang
178
4
0
24 Nov 2022
Breaking the Representation Bottleneck of Chinese Characters: Neural
  Machine Translation with Stroke Sequence Modeling
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhijun Wang
Xuebo Liu
Min Zhang
343
13
0
23 Nov 2022
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural VocodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Rodrigo Mira
Buye Xu
Jacob Donley
Anurag Kumar
Stavros Petridis
V. Ithapu
Maja Pantic
217
19
0
20 Nov 2022
Efficient Transformers with Dynamic Token Pooling
Efficient Transformers with Dynamic Token PoolingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Piotr Nawrot
J. Chorowski
Adrian Lañcucki
Edoardo Ponti
251
69
0
17 Nov 2022
Hypergraph Transformer for Skeleton-based Action Recognition
Hypergraph Transformer for Skeleton-based Action Recognition
Yuxuan Zhou
Zhi-Qi Cheng
Chong Li
Yanwen Fang
Yifeng Geng
Xuansong Xie
Margret Keuper
ViT
328
80
0
17 Nov 2022
ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
Jiayi Shao
Xiaohan Wang
Yi Yang
149
1
0
17 Nov 2022
Parameter-Efficient Transformer with Hybrid Axial-Attention for Medical
  Image Segmentation
Parameter-Efficient Transformer with Hybrid Axial-Attention for Medical Image Segmentation
Yiyue Hu
Lei Zhang
Nan Mu
Leijun Liu
ViTMedIm
111
1
0
17 Nov 2022
ComMU: Dataset for Combinatorial Music Generation
ComMU: Dataset for Combinatorial Music GenerationNeural Information Processing Systems (NeurIPS), 2022
Lee Hyun
Taehyun Kim
Hyolim Kang
Minjoo Ki
H. Hwang
Kwanho Park
Sharang Han
Seon Joo Kim
182
17
0
17 Nov 2022
Deep Emotion Recognition in Textual Conversations: A Survey
Deep Emotion Recognition in Textual Conversations: A SurveyArtificial Intelligence Review (Artif Intell Rev), 2022
Patrícia Pereira
Helena Moniz
Joao Paulo Carvalho
460
36
0
16 Nov 2022
Token Turing Machines
Token Turing MachinesComputer Vision and Pattern Recognition (CVPR), 2022
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
258
30
0
16 Nov 2022
An Overview on Controllable Text Generation via Variational
  Auto-Encoders
An Overview on Controllable Text Generation via Variational Auto-Encoders
Haoqin Tu
Yitong Li
BDL
182
3
0
15 Nov 2022
YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with
  Emotion Annotations
YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion AnnotationsInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Eunjin Choi
Y. Chung
Seolhee Lee
JongIk Jeon
Taegyun Kwon
Juhan Nam
160
11
0
14 Nov 2022
Creative Writing with an AI-Powered Writing Assistant: Perspectives from
  Professional Writers
Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers
Daphne Ippolito
Ann Yuan
Andy Coenen
Sehmon Burnam
231
124
0
09 Nov 2022
Cross-Attention is all you need: Real-Time Streaming Transformers for
  Personalised Speech Enhancement
Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement
Shucong Zhang
Malcolm Chadwick
Alberto Gil C. P. Ramos
S. Bhattacharya
134
5
0
08 Nov 2022
Self-conditioned Embedding Diffusion for Text Generation
Self-conditioned Embedding Diffusion for Text Generation
Robin Strudel
Corentin Tallec
Florent Altché
Yilun Du
Yaroslav Ganin
...
Will Grathwohl
Nikolay Savinov
Sander Dieleman
Laurent Sifre
Rémi Leblond
DiffM
239
107
0
08 Nov 2022
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Linear Self-Attention Approximation via Trainable Feedforward KernelInternational Conference on Artificial Neural Networks (ICANN), 2022
Uladzislau Yorsh
Alexander Kovalenko
271
1
0
08 Nov 2022
BERT-Deep CNN: State-of-the-Art for Sentiment Analysis of COVID-19
  Tweets
BERT-Deep CNN: State-of-the-Art for Sentiment Analysis of COVID-19 TweetsSocial Network Analysis and Mining (SNAM), 2022
Javad Hassannataj Joloudari
Sadiq Hussain
M. Nematollahi
Rouhollah Bagheri
Fatemeh Fazl
R. Alizadehsani
Reza Lashgari
Ashis Talukder
209
59
0
04 Nov 2022
Circling Back to Recurrent Models of Language
Circling Back to Recurrent Models of Language
Gábor Melis
238
0
0
03 Nov 2022
Variable Attention Masking for Configurable Transformer Transducer
  Speech Recognition
Variable Attention Masking for Configurable Transformer Transducer Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
P. Swietojanski
Stefan Braun
Dogan Can
Thiago Fraga da Silva
Arnab Ghoshal
...
Henry Mason
Erik McDermott
Honza Silovsky
R. Travadi
Xiaodan Zhuang
246
21
0
02 Nov 2022
Processing Long Legal Documents with Pre-trained Transformers: Modding
  LegalBERT and Longformer
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer
Dimitris Mamakas
Petros Tsotsi
Ion Androutsopoulos
Ilias Chalkidis
VLMAILaw
264
39
0
02 Nov 2022
SDMuse: Stochastic Differential Music Editing and Generation via Hybrid
  Representation
SDMuse: Stochastic Differential Music Editing and Generation via Hybrid RepresentationIEEE transactions on multimedia (IEEE TMM), 2022
Chen Zhang
Yi Ren
Kecheng Zhang
Shuicheng Yan
DiffM
240
19
0
01 Nov 2022
Accelerating Distributed MoE Training and Inference with Lina
Accelerating Distributed MoE Training and Inference with LinaUSENIX Annual Technical Conference (USENIX ATC), 2022
Jiamin Li
Yimin Jiang
Yibo Zhu
Cong Wang
Hong-Yu Xu
MoE
224
110
0
31 Oct 2022
Previous
123...171819...394041
Next
Page 18 of 41
Pageof 41