ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.02860
  4. Cited By
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
    VLM
ArXivPDFHTML

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 604 papers shown
Title
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
26
937
0
04 Dec 2019
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for
  Generative Models
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Giannis Daras
Augustus Odena
Han Zhang
A. Dimakis
32
54
0
27 Nov 2019
Improving Polyphonic Music Models with Feature-Rich Encoding
Improving Polyphonic Music Models with Feature-Rich Encoding
Omar Peracha
30
12
0
26 Nov 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
19
68
0
26 Nov 2019
Graph Transformer for Graph-to-Sequence Learning
Graph Transformer for Graph-to-Sequence Learning
Deng Cai
W. Lam
29
220
0
18 Nov 2019
Understanding and Improving Layer Normalization
Understanding and Improving Layer Normalization
Jingjing Xu
Xu Sun
Zhiyuan Zhang
Guangxiang Zhao
Junyang Lin
FAtt
18
338
0
16 Nov 2019
What do you mean, BERT? Assessing BERT as a Distributional Semantics
  Model
What do you mean, BERT? Assessing BERT as a Distributional Semantics Model
Timothee Mickus
Denis Paperno
Mathieu Constant
Kees van Deemter
21
45
0
13 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
11
620
0
13 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
24
147
0
10 Nov 2019
Location Attention for Extrapolation to Longer Sequences
Location Attention for Extrapolation to Longer Sequences
Yann Dubois
Gautier Dagan
Dieuwke Hupkes
Elia Bruni
15
40
0
10 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
11
87
0
10 Nov 2019
How Decoding Strategies Affect the Verifiability of Generated Text
How Decoding Strategies Affect the Verifiability of Generated Text
Luca Massarelli
Fabio Petroni
Aleksandra Piktus
Myle Ott
Tim Rocktaschel
Vassilis Plachouras
Fabrizio Silvestri
Sebastian Riedel
23
50
0
09 Nov 2019
Certified Data Removal from Machine Learning Models
Certified Data Removal from Machine Learning Models
Chuan Guo
Tom Goldstein
Awni Y. Hannun
L. V. D. van der Maaten
MU
34
416
0
08 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
51
808
0
01 Nov 2019
Transfer Learning from Transformers to Fake News Challenge Stance
  Detection (FNC-1) Task
Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task
Valeriya Slovikovskaya
16
41
0
31 Oct 2019
Phenotyping of Clinical Notes with Improved Document Classification
  Models Using Contextualized Neural Language Models
Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models
Andriy Mulyar
Elliot Schumacher
Masoud Rouhizadeh
Mark Dredze
11
39
0
30 Oct 2019
An Adaptive and Momental Bound Method for Stochastic Learning
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
11
46
0
27 Oct 2019
A holistic approach to polyphonic music transcription with neural
  networks
A holistic approach to polyphonic music transcription with neural networks
Miguel A. Román
A. Pertusa
Jorge Calvo-Zaragoza
11
29
0
26 Oct 2019
Recognizing long-form speech using streaming end-to-end models
Recognizing long-form speech using streaming end-to-end models
A. Narayanan
Rohit Prabhavalkar
Chung-Cheng Chiu
David Rybach
Tara N. Sainath
Trevor Strohman
20
129
0
24 Oct 2019
Hierarchical Transformers for Long Document Classification
Hierarchical Transformers for Long Document Classification
R. Pappagari
Piotr Żelasko
Jesús Villalba
Yishay Carmiel
Najim Dehak
11
239
0
23 Oct 2019
Correction of Automatic Speech Recognition with Transformer
  Sequence-to-sequence Model
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii Hrinchuk
Mariya Popova
Boris Ginsburg
VLM
17
87
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep
  Transformer Networks
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
9
44
0
23 Oct 2019
Improving Transformer-based Speech Recognition Using Unsupervised
  Pre-training
Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
Dongwei Jiang
Xiaoning Lei
Wubo Li
Ne Luo
Yuxuan Hu
Wei Zou
Xiangang Li
24
99
0
22 Oct 2019
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using
  Contextualized Embeddings
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings
Dhruva Sahrawat
Debanjan Mahata
Mayank Kulkarni
Haimin Zhang
Rakesh Gosangi
Amanda Stent
Agniv Sharma
Yaman Kumar Singla
R. Shah
Roger Zimmermann
9
30
0
19 Oct 2019
Transformer ASR with Contextual Block Processing
Transformer ASR with Contextual Block Processing
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
51
64
0
16 Oct 2019
Dialogue Transformers
Dialogue Transformers
Vladimir Vlasov
Johannes E. M. Mosig
Alan Nichol
18
56
0
01 Oct 2019
A Constructive Prediction of the Generalization Error Across Scales
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld
Amir Rosenfeld
Yonatan Belinkov
Nir Shavit
22
205
0
27 Sep 2019
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete
  and Continuous Control
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
H. F. Song
A. Abdolmaleki
Jost Tobias Springenberg
Aidan Clark
Hubert Soyer
...
Dhruva Tirumala
N. Heess
Dan Belov
Martin Riedmiller
M. Botvinick
17
121
0
26 Sep 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
70
6,370
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
22
584
0
25 Sep 2019
Synthetic Data for Deep Learning
Synthetic Data for Deep Learning
Sergey I. Nikolenko
46
348
0
25 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems
Exascale Deep Learning for Scientific Inverse Problems
N. Laanait
Josh Romero
Junqi Yin
M. T. Young
Sean Treichler
V. Starchenko
A. Borisevich
Alexander Sergeev
Michael A. Matheson
FedML
BDL
27
29
0
24 Sep 2019
Knowledge-Enriched Transformer for Emotion Detection in Textual
  Conversations
Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations
Peixiang Zhong
Di Wang
C. Miao
24
269
0
24 Sep 2019
Language models and Automated Essay Scoring
Language models and Automated Essay Scoring
Pedro Uría Rodríguez
Amir Jafari
C. Ormerod
22
82
0
18 Sep 2019
Span-based Joint Entity and Relation Extraction with Transformer
  Pre-training
Span-based Joint Entity and Relation Extraction with Transformer Pre-training
Markus Eberts
A. Ulges
LRM
ViT
164
380
0
17 Sep 2019
CTRL: A Conditional Transformer Language Model for Controllable
  Generation
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
L. Varshney
Caiming Xiong
R. Socher
AI4CE
55
1,233
0
11 Sep 2019
PaLM: A Hybrid Parser and Language Model
PaLM: A Hybrid Parser and Language Model
Hao Peng
Roy Schwartz
Noah A. Smith
AIMat
23
15
0
04 Sep 2019
AutoML: A Survey of the State-of-the-Art
AutoML: A Survey of the State-of-the-Art
Xin He
Kaiyong Zhao
X. Chu
20
1,419
0
02 Aug 2019
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
S. Rothe
Shashi Narayan
Aliaksei Severyn
SILM
69
433
0
29 Jul 2019
EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification
EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification
Li Luo
Yue Wang
23
26
0
23 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Jiliang Tang
ViT
11
105
0
12 Jul 2019
LakhNES: Improving multi-instrumental music generation with cross-domain
  pre-training
LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Chris Donahue
H. H. Mao
Yiting Li
G. Cottrell
Julian McAuley
30
116
0
10 Jul 2019
Transfer Learning in Biomedical Natural Language Processing: An
  Evaluation of BERT and ELMo on Ten Benchmarking Datasets
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets
Yifan Peng
Shankai Yan
Zhiyong Lu
LM&MA
AI4MH
13
830
0
13 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
19
357
0
07 Jun 2019
Large-Scale Multi-Label Text Classification on EU Legislation
Large-Scale Multi-Label Text Classification on EU Legislation
Ilias Chalkidis
Manos Fergadiotis
Prodromos Malakasiotis
Ion Androutsopoulos
AILaw
11
212
0
05 Jun 2019
Adversarial Generation and Encoding of Nested Texts
Adversarial Generation and Encoding of Nested Texts
A. Rozental
GAN
11
0
0
01 Jun 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivity
Junzhe Zhang
Tianxing He
S. Sra
Ali Jadbabaie
16
441
0
28 May 2019
Interpreting and improving natural-language processing (in machines)
  with natural language-processing (in the brain)
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
Mariya Toneva
Leila Wehbe
MILM
AI4CE
33
219
0
28 May 2019
Style Transformer: Unpaired Text Style Transfer without Disentangled
  Latent Representation
Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation
Ning Dai
Jianze Liang
Xipeng Qiu
Xuanjing Huang
DRL
8
202
0
14 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
27
172
0
10 May 2019
Previous
123...111213
Next