ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXiv (abs)PDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown
Transfer Learning Robustness in Multi-Class Categorization by
  Fine-Tuning Pre-Trained Contextualized Language Models
Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models
Xinyi Liu
A. Wangperawong
184
4
0
08 Sep 2019
PaLM: A Hybrid Parser and Language Model
PaLM: A Hybrid Parser and Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Hao Peng
Roy Schwartz
Noah A. Smith
AIMat
172
17
0
04 Sep 2019
Deep Equilibrium Models
Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019
Shaojie Bai
J. Zico Kolter
V. Koltun
225
780
0
03 Sep 2019
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELMAI4MH
1.1K
3,002
0
03 Sep 2019
Subword Language Model for Query Auto-Completion
Subword Language Model for Query Auto-CompletionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Gyuwan Kim
141
17
0
02 Sep 2019
NEZHA: Neural Contextualized Representation for Chinese Language
  Understanding
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Junqiu Wei
Xiaozhe Ren
Xiaoguang Li
Wenyong Huang
Yi-Lun Liao
Yasheng Wang
Jianghao Lin
Xin Jiang
Xiao Chen
Qun Liu
270
126
0
31 Aug 2019
Quantity doesn't buy quality syntax with neural language models
Quantity doesn't buy quality syntax with neural language modelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Marten van Schijndel
Aaron Mueller
Tal Linzen
197
75
0
31 Aug 2019
Behavior Gated Language Models
Behavior Gated Language Models
Prashanth Gurunath Shivakumar
Shao-Yen Tseng
P. Georgiou
Shrikanth Narayanan
148
1
0
31 Aug 2019
Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic
  Text Exchange
Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text ExchangeConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Steven Y. Feng
Aaron W. Li
Jesse Hoey
247
30
0
30 Aug 2019
Transformer Dissection: A Unified Understanding of Transformer's
  Attention via the Lens of Kernel
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of KernelConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Yifan Hao
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
524
298
0
30 Aug 2019
Discourse-Aware Semantic Self-Attention for Narrative Reading
  Comprehension
Discourse-Aware Semantic Self-Attention for Narrative Reading ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Todor Mihaylov
Anette Frank
180
29
0
28 Aug 2019
Multiresolution Transformer Networks: Recurrence is Not Essential for
  Modeling Hierarchical Structure
Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure
Vikas Garg
Inderjit S. Dhillon
Hsiang-Fu Yu
135
7
0
27 Aug 2019
Bridging the Gap for Tokenizer-Free Language Models
Bridging the Gap for Tokenizer-Free Language Models
Dokook Choe
Rami Al-Rfou
Mandy Guo
Heeyoung Lee
Noah Constant
183
23
0
27 Aug 2019
Training Optimus Prime, M.D.: Generating Medical Certification Items by
  Fine-Tuning OpenAI's gpt2 Transformer Model
Training Optimus Prime, M.D.: Generating Medical Certification Items by Fine-Tuning OpenAI's gpt2 Transformer Model
M. Davier
MedImLM&MA
100
16
0
23 Aug 2019
Latent Relation Language Models
Latent Relation Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2019
Hiroaki Hayashi
Zecong Hu
Chenyan Xiong
Graham Neubig
KELM
173
43
0
21 Aug 2019
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot
  Multilingual Transfer for Bulgarian
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianRecent Advances in Natural Language Processing (RANLP), 2019
Momchil Hardalov
Ivan Koychev
Preslav Nakov
RALM
200
21
0
05 Aug 2019
AutoML: A Survey of the State-of-the-Art
AutoML: A Survey of the State-of-the-ArtKnowledge-Based Systems (KBS), 2019
Xin He
Kaiyong Zhao
Xiaowen Chu
787
1,677
0
02 Aug 2019
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Leveraging Pre-trained Checkpoints for Sequence Generation TasksTransactions of the Association for Computational Linguistics (TACL), 2019
S. Rothe
Shashi Narayan
Aliaksei Severyn
SILM
330
461
0
29 Jul 2019
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
ERNIE 2.0: A Continual Pre-training Framework for Language UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2019
Yu Sun
Shuohuan Wang
Yukun Li
Shikun Feng
Hao Tian
Hua Wu
Haifeng Wang
CLL
420
864
0
29 Jul 2019
DLGNet: A Transformer-based Model for Dialogue Response Generation
DLGNet: A Transformer-based Model for Dialogue Response Generation
O. Olabiyi
Erik T. Mueller
191
13
0
26 Jul 2019
SpanBERT: Improving Pre-training by Representing and Predicting Spans
SpanBERT: Improving Pre-training by Representing and Predicting SpansTransactions of the Association for Computational Linguistics (TACL), 2019
Mandar Joshi
Danqi Chen
Yinhan Liu
Daniel S. Weld
Luke Zettlemoyer
Omer Levy
622
2,106
0
24 Jul 2019
EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification
EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification
Li Luo
Yue Wang
123
30
0
23 Jul 2019
Self-Attentional Credit Assignment for Transfer in Reinforcement
  Learning
Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
Johan Ferret
Raphaël Marinier
Matthieu Geist
Olivier Pietquin
OffRL
191
6
0
18 Jul 2019
Agglomerative Attention
Agglomerative Attention
Matthew Spellings
77
0
0
15 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Shucheng Zhou
ViT
179
113
0
12 Jul 2019
LakhNES: Improving multi-instrumental music generation with cross-domain
  pre-training
LakhNES: Improving multi-instrumental music generation with cross-domain pre-trainingInternational Society for Music Information Retrieval Conference (ISMIR), 2019
Chris Donahue
H. H. Mao
Yiting Li
G. Cottrell
Julian McAuley
245
132
0
10 Jul 2019
Large Memory Layers with Product Keys
Large Memory Layers with Product KeysNeural Information Processing Systems (NeurIPS), 2019
Guillaume Lample
Alexandre Sablayrolles
MarcÁurelio Ranzato
Ludovic Denoyer
Edouard Grave
MoE
246
154
0
10 Jul 2019
Augmenting Self-attention with Persistent Memory
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Edouard Grave
Armand Joulin
RALMKELM
228
149
0
02 Jul 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language ModelingNeural Information Processing Systems (NeurIPS), 2019
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
354
188
0
24 Jun 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language UnderstandingNeural Information Processing Systems (NeurIPS), 2019
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
935
9,121
0
19 Jun 2019
Pre-Training with Whole Word Masking for Chinese BERT
Pre-Training with Whole Word Masking for Chinese BERTIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
Yiming Cui
Wanxiang Che
Ting Liu
Bing Qin
Ziqing Yang
265
233
0
19 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence ModelsTransactions of the Association for Computational Linguistics (TACL), 2019
Michael Hahn
352
338
0
16 Jun 2019
One Epoch Is All You Need
One Epoch Is All You Need
Aran Komatsuzaki
139
59
0
16 Jun 2019
Transfer Learning in Biomedical Natural Language Processing: An
  Evaluation of BERT and ELMo on Ten Benchmarking Datasets
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets
Yifan Peng
Shankai Yan
Zhiyong Lu
LM&MAAI4MH
397
941
0
13 Jun 2019
Hierarchical Representation in Neural Language Models: Suppression and
  Recovery of Expectations
Hierarchical Representation in Neural Language Models: Suppression and Recovery of Expectations
Ethan Gotlieb Wilcox
R. Levy
Richard Futrell
MILM
128
34
0
10 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
281
429
0
07 Jun 2019
Attention is all you need for Videos: Self-attention based Video
  Summarization using Universal Transformers
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Manjot Bilkhu
Siyang Wang
Tushar Dobhal
ViT
107
19
0
06 Jun 2019
Understanding and Improving Transformer From a Multi-Particle Dynamic
  System Point of View
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
241
205
0
06 Jun 2019
Large-Scale Multi-Label Text Classification on EU Legislation
Large-Scale Multi-Label Text Classification on EU LegislationAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Ilias Chalkidis
Manos Fergadiotis
Prodromos Malakasiotis
Ion Androutsopoulos
AILaw
214
246
0
05 Jun 2019
Towards Lossless Encoding of Sentences
Towards Lossless Encoding of SentencesAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Gabriele Prato
Mathieu Duchesneau
A. Chandar
Alain Tapp
138
2
0
04 Jun 2019
Adversarial Generation and Encoding of Nested Texts
Adversarial Generation and Encoding of Nested Texts
A. Rozental
GAN
52
0
0
01 Jun 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivityInternational Conference on Learning Representations (ICLR), 2019
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
364
551
0
28 May 2019
Interpreting and improving natural-language processing (in machines)
  with natural language-processing (in the brain)
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)Neural Information Processing Systems (NeurIPS), 2019
Mariya Toneva
Leila Wehbe
MILMAI4CE
409
275
0
28 May 2019
Stochastic Gradient Methods with Layer-wise Adaptive Moments for
  Training of Deep Networks
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
Boris Ginsburg
P. Castonguay
Oleksii Hrinchuk
Oleksii Kuchaiev
Vitaly Lavrukhin
Ryan Leary
Jason Chun Lok Li
Huyen Nguyen
Yang Zhang
Jonathan M. Cohen
ODL
243
13
0
27 May 2019
Using Neural Networks for Relation Extraction from Biomedical Literature
Using Neural Networks for Relation Extraction from Biomedical Literature
Diana Sousa
Andre Lamurias
Francisco M. Couto
130
12
0
27 May 2019
Extreme Multi-Label Legal Text Classification: A case study in EU
  Legislation
Extreme Multi-Label Legal Text Classification: A case study in EU Legislation
Ilias Chalkidis
Manos Fergadiotis
Prodromos Malakasiotis
Nikolaos Aletras
Ion Androutsopoulos
AILaw
178
81
0
26 May 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?Neural Information Processing Systems (NeurIPS), 2019
Paul Michel
Omer Levy
Graham Neubig
MoE
420
1,242
0
25 May 2019
Exposure Bias versus Self-Recovery: Are Distortions Really Incremental
  for Autoregressive Text Generation?
Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Tianxing He
Jingzhao Zhang
Zhiming Zhou
James R. Glass
525
42
0
25 May 2019
SCRAM: Spatially Coherent Randomized Attention Maps
SCRAM: Spatially Coherent Randomized Attention Maps
D. A. Calian
P. Roelants
Jacques Calì
B. Carr
K. Dubba
John E. Reid
Dell Zhang
127
2
0
24 May 2019
Adaptive Attention Span in Transformers
Adaptive Attention Span in TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
176
309
0
19 May 2019
Previous
123...394041
Next
Page 40 of 41
Pageof 41