v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models

Xinyi Liu

A. Wangperawong

184

08 Sep 2019

PaLM: A Hybrid Parser and Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

172

04 Sep 2019

Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019

Shaojie Bai

J. Zico Kolter

V. Koltun

225

780

03 Sep 2019

Language Models as Knowledge Bases?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Patrick Lewis

1.1K

3,002

03 Sep 2019

Subword Language Model for Query Auto-CompletionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Gyuwan Kim

141

02 Sep 2019

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Xin Jiang

Qun Liu

270

126

31 Aug 2019

Quantity doesn't buy quality syntax with neural language modelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Marten van Schijndel

Aaron Mueller

Tal Linzen

197

31 Aug 2019

Behavior Gated Language Models

Prashanth Gurunath Shivakumar

Shao-Yen Tseng

P. Georgiou

Shrikanth Narayanan

148

31 Aug 2019

Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text ExchangeConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Steven Y. Feng

Aaron W. Li

Jesse Hoey

247

30 Aug 2019

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of KernelConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Yifan Hao

Shaojie Bai

M. Yamada

Louis-Philippe Morency

Ruslan Salakhutdinov

524

298

30 Aug 2019

Discourse-Aware Semantic Self-Attention for Narrative Reading ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Todor Mihaylov

Anette Frank

180

28 Aug 2019

Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

Vikas Garg

Inderjit S. Dhillon

Hsiang-Fu Yu

135

27 Aug 2019

Bridging the Gap for Tokenizer-Free Language Models

183

27 Aug 2019

Training Optimus Prime, M.D.: Generating Medical Certification Items by Fine-Tuning OpenAI's gpt2 Transformer Model

M. Davier

MedIm LM&MA

100

23 Aug 2019

Latent Relation Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2019

Graham Neubig

173

21 Aug 2019

Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianRecent Advances in Natural Language Processing (RANLP), 2019

200

05 Aug 2019

AutoML: A Survey of the State-of-the-ArtKnowledge-Based Systems (KBS), 2019

Xin He

Kaiyong Zhao

Xiaowen Chu

787

1,677

02 Aug 2019

Leveraging Pre-trained Checkpoints for Sequence Generation TasksTransactions of the Association for Computational Linguistics (TACL), 2019

330

461

29 Jul 2019

ERNIE 2.0: A Continual Pre-training Framework for Language UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2019

420

864

29 Jul 2019

DLGNet: A Transformer-based Model for Dialogue Response Generation

O. Olabiyi

Erik T. Mueller

191

26 Jul 2019

SpanBERT: Improving Pre-training by Representing and Predicting SpansTransactions of the Association for Computational Linguistics (TACL), 2019

Luke Zettlemoyer

622

2,106

24 Jul 2019

EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification

Li Luo

Yue Wang

123

23 Jul 2019

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

Olivier Pietquin

191

18 Jul 2019

Agglomerative Attention

Matthew Spellings

15 Jul 2019

R-Transformer: Recurrent Neural Network Enhanced Transformer

Zitao Liu

179

113

12 Jul 2019

LakhNES: Improving multi-instrumental music generation with cross-domain pre-trainingInternational Society for Music Information Retrieval Conference (ISMIR), 2019

245

132

10 Jul 2019

Large Memory Layers with Product KeysNeural Information Processing Systems (NeurIPS), 2019

Guillaume Lample

Alexandre Sablayrolles

246

154

10 Jul 2019

Augmenting Self-attention with Persistent Memory

228

149

02 Jul 2019

A Tensorized Transformer for Language ModelingNeural Information Processing Systems (NeurIPS), 2019

354

188

24 Jun 2019

XLNet: Generalized Autoregressive Pretraining for Language UnderstandingNeural Information Processing Systems (NeurIPS), 2019

935

9,121

19 Jun 2019

Pre-Training with Whole Word Masking for Chinese BERTIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019

265

233

19 Jun 2019

Theoretical Limitations of Self-Attention in Neural Sequence ModelsTransactions of the Association for Computational Linguistics (TACL), 2019

Michael Hahn

352

338

16 Jun 2019

One Epoch Is All You Need

Aran Komatsuzaki

139

16 Jun 2019

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

397

941

13 Jun 2019

Hierarchical Representation in Neural Language Models: Suppression and Recovery of Expectations

128

10 Jun 2019

Analyzing the Structure of Attention in a Transformer Language Model

Jesse Vig

Yonatan Belinkov

281

429

07 Jun 2019

Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers

107

06 Jun 2019

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

241

205

06 Jun 2019

Large-Scale Multi-Label Text Classification on EU LegislationAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

Ilias Chalkidis

Manos Fergadiotis

Prodromos Malakasiotis

Ion Androutsopoulos

AILaw

214

246

05 Jun 2019

Towards Lossless Encoding of SentencesAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

138

04 Jun 2019

Adversarial Generation and Encoding of Nested Texts

A. Rozental

GAN

01 Jun 2019

Why gradient clipping accelerates training: A theoretical justification for adaptivityInternational Conference on Learning Representations (ICLR), 2019

J.N. Zhang

Tianxing He

S. Sra

Ali Jadbabaie

364

551

28 May 2019

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)Neural Information Processing Systems (NeurIPS), 2019

Mariya Toneva

Leila Wehbe

MILM AI4CE

409

275

28 May 2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

Boris Ginsburg

243

27 May 2019

Using Neural Networks for Relation Extraction from Biomedical Literature

Diana Sousa

Andre Lamurias

Francisco M. Couto

130

27 May 2019

Extreme Multi-Label Legal Text Classification: A case study in EU Legislation

Ilias Chalkidis

Manos Fergadiotis

Prodromos Malakasiotis

Nikolaos Aletras

Ion Androutsopoulos

AILaw

178

26 May 2019

Are Sixteen Heads Really Better than One?Neural Information Processing Systems (NeurIPS), 2019

Paul Michel

Omer Levy

Graham Neubig

MoE

420

1,242

25 May 2019

Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Tianxing He

Jingzhao Zhang

Zhiming Zhou

James R. Glass

525

25 May 2019

SCRAM: Spatially Coherent Randomized Attention Maps

127

24 May 2019

Adaptive Attention Span in TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

176

309

19 May 2019