v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and UnderstandingInterspeech (Interspeech), 2023

Titouan Parcollet

Rogier van Dalen

Shucong Zhang

S. Bhattacharya

236

12 Jul 2023

Pluggable Neural Machine Translation Models via Memory-augmented AdaptersInternational Conference on Language Resources and Evaluation (LREC), 2023

Yuzhuang Xu

Shuo Wang

Peng Li

Xuebo Liu

Xiaolong Wang

Weidong Liu

Yang Liu

342

12 Jul 2023

Transformers in Reinforcement Learning: A Survey

Samira Ebrahimi Kahou

OffRL

252

12 Jul 2023

ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production

158

11 Jul 2023

ShredGP: Guitarist Style-Conditioned Tablature Generation

183

11 Jul 2023

Learning to Solve Constraint Satisfaction Problems with Recurrent TransformerInternational Conference on Learning Representations (ICLR), 2023

Zhun Yang

Adam Ishay

Joohyung Lee

341

10 Jul 2023

Lost in the Middle: How Language Models Use Long ContextsTransactions of the Association for Computational Linguistics (TACL), 2023

557

2,692

06 Jul 2023

Focused Transformer: Contrastive Training for Context ScalingNeural Information Processing Systems (NeurIPS), 2023

Henryk Michalewski

235

165

06 Jul 2023

LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention BiasKnowledge Discovery and Data Mining (KDD), 2023

255

06 Jul 2023

LongNet: Scaling Transformers to 1,000,000,000 Tokens

473

221

05 Jul 2023

Facing Off World Model Backbones: RNNs, Transformers, and S4Neural Information Processing Systems (NeurIPS), 2023

Fei Deng

Junyeong Park

Sungjin Ahn

340

05 Jul 2023

Improving Automatic Parallel Training via Balanced Memory Workload OptimizationIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

Yujie Wang

Youhe Jiang

Xupeng Miao

Fangcheng Fu

Shenhan Zhu

Xiaonan Nie

Yaofeng Tu

Tengjiao Wang

313

05 Jul 2023

Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure

Yikang Wang

Hiromitsu Nishizaki

Ming Li

215

04 Jul 2023

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Matthew Raffel

Lizhong Chen

158

03 Jul 2023

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech TranslationInternational Conference on Machine Learning (ICML), 2023

Matthew Raffel

Drew Penney

Lizhong Chen

162

03 Jul 2023

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph ReadingInterspeech (Interspeech), 2023

183

03 Jul 2023

MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problemsJournal of the mechanics and physics of solids (JMPS), 2023

Markus J. Buehler

AI4CE

223

30 Jun 2023

Knowledge Base Completion for Long-Tail Entities

223

30 Jun 2023

Leveraging Cross-Utterance Context For ASR DecodingInterspeech (Interspeech), 2023

Robert Flynn

Anton Ragni

194

29 Jun 2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023

256

28 Jun 2023

Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation StudioConference on Algebraic Informatics (CAI), 2023

123

28 Jun 2023

Extending Context Window of Large Language Models via Positional Interpolation

436

684

27 Jun 2023

Length Generalization in Arithmetic Transformers

Samy Jelassi

Stéphane dÁscoli

Carles Domingo-Enrich

Yuhuai Wu

Yuan-Fang Li

Franccois Charton

258

27 Jun 2023

MotionGPT: Human Motion as a Foreign LanguageNeural Information Processing Systems (NeurIPS), 2023

Jingyi Yu

Tao Chen

292

450

26 Jun 2023

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

Kuan-Fu Ding

Jingyang Li

Kim-Chuan Toh

442

26 Jun 2023

Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window

Jinkyu Koo

John Yang

Le An

Gwenaelle Cunha Sergio

Su Inn Park

ViT

106

23 Jun 2023

Efficient Online Processing with Deep Neural Networks

Lukas Hedegaard

209

23 Jun 2023

Long-range Language Modeling with Self-retrievalTransactions of the Association for Computational Linguistics (TACL), 2023

Ohad Rubin

Jonathan Berant

RALM KELM

229

23 Jun 2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition SystemsInterspeech (Interspeech), 2023

Jiajun Deng

Xie Chen

174

23 Jun 2023

Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized ModelsNeural Information Processing Systems (NeurIPS), 2023

Leonardo Galli

Holger Rauhut

Mark Schmidt

211

22 Jun 2023

Exploring the Role of Audio in Video Captioning

Linjie Yang

Heng Wang

169

21 Jun 2023

Sparse Modular Activation for Efficient Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023

Liliang Ren

Yang Liu

Shuohang Wang

Yichong Xu

Chenguang Zhu

Chengxiang Zhai

278

19 Jun 2023

NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation LearningNeural Information Processing Systems (NeurIPS), 2023

322

19 Jun 2023

Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023

Ruiqi Zhang

Spencer Frei

Peter L. Bartlett

413

281

16 Jun 2023

Pushing the Limits of ChatGPT on NLP Tasks

...

Jiwei Li

294

16 Jun 2023

TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling

16 Jun 2023

Block-State TransformersNeural Information Processing Systems (NeurIPS), 2023

Pierre-Luc Bacon

253

15 Jun 2023

Recurrent Action Transformer with Memory

397

15 Jun 2023

Semantic HELM: A Human-Readable Memory for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

300

15 Jun 2023

Iterative self-transfer learning: A general methodology for response time-history prediction based on small datasetJournal of Computational Design and Engineering (JCDE), 2023

166

14 Jun 2023

Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time SeriesKnowledge Discovery and Data Mining (KDD), 2023

Jiang Bian

158

14 Jun 2023

Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure

189

14 Jun 2023

Augmenting Language Models with Long-Term MemoryNeural Information Processing Systems (NeurIPS), 2023

Xiaodong Liu

241

142

12 Jun 2023

Recurrent Attention Networks for Long-text ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Haoran Xie

217

12 Jun 2023

A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language TextNatural Language Processing Journal (JNLP), 2023

Jessica Nayeli López Espejel

Mahaman Sanoussi Yahaya Alassan

El Mehdi Chouham

Walid Dahhane

E. Ettifouri

269

10 Jun 2023

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

338

08 Jun 2023

ModuleFormer: Modularity Emerges from Mixture-of-Experts

Chuang Gan

202

07 Jun 2023

Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency

Yuxuan Wang

Hong Lyu

147

05 Jun 2023

The Information Pathways Hypothesis: Transformers are Dynamic Self-EnsemblesKnowledge Discovery and Data Mining (KDD), 2023

Md Shamim Hussain

Mohammed J Zaki

D. Subramanian

325

02 Jun 2023

Data-Efficient French Language Modeling with CamemBERTaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Wissam Antoun

Benoît Sagot

Djamé Seddah

152

02 Jun 2023