v1v2v3 (latest)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

9 January 2019

Papers citing "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"

50 / 2,022 papers shown

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length ExtrapolationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ta-Chung Chi

Ting-Han Fan

Alexander I. Rudnicky

Peter J. Ramadge

LRM

153

05 May 2023

Hierarchical Transformer for Scalable Graph LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Liang Wang

253

04 May 2023

Leveraging BERT Language Model for Arabic Long Document Classification

Muhammad Al-Qurishi

182

04 May 2023

BranchNorm: Robustly Scaling Extremely Deep TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Yanjun Liu

Xianfeng Zeng

Fandong Meng

Jie Zhou

180

04 May 2023

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

235

03 May 2023

FreeLM: Fine-Tuning-Free Language Model

Xiang Li

Xin Jiang

Xuying Meng

Aixin Sun

Yequan Wang

188

02 May 2023

EvoluNet: Advancing Dynamic Non-IID Transfer Learning on GraphsInternational Conference on Machine Learning (ICML), 2023

Yujun Yan

...

462

01 May 2023

DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation

Nassir Navab

166

28 Apr 2023

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might BeInternational Conference on Learning Representations (ICLR), 2023

292

103

27 Apr 2023

Technical Report: Impact of Position Bias on Language Models in Token Classification

Mehdi Ben Amor

Michael Granitzer

Jelena Mitrović

385

26 Apr 2023

Tensor Decomposition for Model Reduction in Neural Networks: A ReviewIEEE Circuits and Systems Magazine (IEEE CAS Magazine), 2023

Xingyi Liu

Keshab K. Parhi

198

26 Apr 2023

UNADON: Transformer-based model to predict genome-wide chromosome spatial position

Muyu Yang

Jian Ma

MedIm ViT

26 Apr 2023

TransFlow: Transformer as Flow LearnerComputer Vision and Pattern Recognition (CVPR), 2023

289

23 Apr 2023

Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health

195

20 Apr 2023

Scaling Transformer to 1M tokens and beyond with RMT

340

111

19 Apr 2023

From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation

Adarsh Kumar

Pedro Sarmento

191

18 Apr 2023

Learning to Compress Prompts with Gist TokensNeural Information Processing Systems (NeurIPS), 2023

Jesse Mu

Xiang Lisa Li

Noah D. Goodman

VLM

444

294

17 Apr 2023

Improving Autoregressive NLP Tasks via Modular Linearized Attention

Victor Agostinelli

Lizhong Chen

290

17 Apr 2023

MisRoBÆRTa: Transformers versus Misinformation

Ciprian-Octavian Truică

Elena Simona Apostol

185

16 Apr 2023

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

178

15 Apr 2023

Fairness in Visual Clustering: A Novel Transformer Clustering Approach

273

14 Apr 2023

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

208

11 Apr 2023

Context-Aware Classification of Legal Document PagesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

154

05 Apr 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Yu Qiao

590

943

28 Mar 2023

Planning with Sequence Models through Iterative Energy MinimizationInternational Conference on Learning Representations (ICLR), 2023

Patricio A. Vela

168

28 Mar 2023

When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLPAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

499

28 Mar 2023

Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion InterpolationComputer Vision and Pattern Recognition (CVPR), 2023

Clinton Mo

Kun Hu

Chengjiang Long

Zhiyong Wang

165

27 Mar 2023

Selective Structured State-Spaces for Long-Form Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023

215

160

25 Mar 2023

Text with Knowledge Graph Augmented Transformer for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023

Yufei Wang

223

22 Mar 2023

Transformers in Speech Processing: A Survey

463

21 Mar 2023

Language Model Behavior: A Comprehensive SurveyInternational Conference on Computational Logic (ICCL), 2023

Tyler A. Chang

Benjamin Bergen

VLM LRM LM&MA

381

143

20 Mar 2023

Unit Scaling: Out-of-the-Box Low-Precision TrainingInternational Conference on Machine Learning (ICML), 2023

Charlie Blake

Douglas Orr

Carlo Luschi

223

20 Mar 2023

HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular SignalsAAAI Conference on Artificial Intelligence (AAAI), 2023

Ella Lan

MedIm

117

17 Mar 2023

BiFormer: Vision Transformer with Bi-Level Routing AttentionComputer Vision and Pattern Recognition (CVPR), 2023

Lei Zhu

352

846

15 Mar 2023

PLEX: Making the Most of the Available Data for Robotic Manipulation PretrainingConference on Robot Learning (CoRL), 2023

G. Thomas

Ching-An Cheng

Ricky Loynd

Felipe Vieira Frujeri

288

15 Mar 2023

PR-MCS: Perturbation Robust Metric for MultiLingual Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

278

15 Mar 2023

AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+

Guo-Jun Qi

178

14 Mar 2023

Transformer Models for Acute Brain Dysfunction Prediction

132

13 Mar 2023

Transformer-based World Models Are Happy With 100k InteractionsInternational Conference on Learning Representations (ICLR), 2023

283

124

13 Mar 2023

An Overview on Language Models: Recent Developments and OutlookAPSIPA Transactions on Signal and Information Processing (TASIP), 2023

Chengwei Wei

Yun Cheng Wang

Bin Wang

C.-C. Jay Kuo

283

10 Mar 2023

Diffusing Gaussian Mixtures for Generating Categorical DataAAAI Conference on Artificial Intelligence (AAAI), 2023

Florence Regol

Mark Coates

DiffM

173

08 Mar 2023

A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Siyu Li

Philip S. Yu

Lichao Sun

277

727

07 Mar 2023

A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course SummarizationMachine Learning in Health Care (MLHC), 2023

Griffin Adams

Jason Zucker

Noémie Elhadad

191

07 Mar 2023

CLIP-guided Prototype Modulating for Few-shot Action RecognitionInternational Journal of Computer Vision (IJCV), 2023

Jun Cen

227

06 Mar 2023

GlobalNER: Incorporating Non-local Information into Named Entity Recognition

Chiao-Wei Hsu

Keh-Yih Su

NAI

149

06 Mar 2023

LooperGP: A Loopable Sequence Model for Live Coding Performance using GuitarPro Tablature

Sara Adkins

Pedro Sarmento

M. Barthet

161

03 Mar 2023

End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

302

248

03 Mar 2023

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable TransformersInternational Conference on Learning Representations (ICLR), 2023

277

02 Mar 2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow EstimationComputer Vision and Pattern Recognition (CVPR), 2023

238

130

02 Mar 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

...

410

348

02 Mar 2023