Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
153
15
0
05 May 2023
Hierarchical Transformer for Scalable Graph Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Wenhao Zhu
Tianyu Wen
Guojie Song
Xiaojun Ma
Liang Wang
253
23
0
04 May 2023
Leveraging BERT Language Model for Arabic Long Document Classification
Muhammad Al-Qurishi
182
1
0
04 May 2023
BranchNorm: Robustly Scaling Extremely Deep Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
180
4
0
04 May 2023
A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems
Minseop Jung
Jaeseung Lee
Jibum Kim
ViT
235
19
0
03 May 2023
FreeLM: Fine-Tuning-Free Language Model
Xiang Li
Xin Jiang
Xuying Meng
Aixin Sun
Yequan Wang
188
3
0
02 May 2023
EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs
International Conference on Machine Learning (ICML), 2023
Haohui Wang
Yuzhen Mao
Yujun Yan
Yaoqing Yang
Jianhui Sun
...
Si Zhang
Alison Hu
Edward Bowen
Tyler Cody
Dawei Zhou
462
7
0
01 May 2023
DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation
Yousef Yeganeh
Azade Farshad
Peter Weinberger
Seyed-Ahmad Ahmadi
Ehsan Adeli
Nassir Navab
ViT
MedIm
166
0
0
28 Apr 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
International Conference on Learning Representations (ICLR), 2023
Frederik Kunstner
Jacques Chen
J. Lavington
Mark Schmidt
292
103
0
27 Apr 2023
Technical Report: Impact of Position Bias on Language Models in Token Classification
Mehdi Ben Amor
Michael Granitzer
Jelena Mitrović
385
3
0
26 Apr 2023
Tensor Decomposition for Model Reduction in Neural Networks: A Review
IEEE Circuits and Systems Magazine (IEEE CAS Magazine), 2023
Xingyi Liu
Keshab K. Parhi
198
27
0
26 Apr 2023
UNADON: Transformer-based model to predict genome-wide chromosome spatial position
Muyu Yang
Jian Ma
MedIm
ViT
61
3
0
26 Apr 2023
TransFlow: Transformer as Flow Learner
Computer Vision and Pattern Recognition (CVPR), 2023
Yawen Lu
Qifan Wang
Siqi Ma
Tong Geng
Victor Y. Chen
Huaijin Chen
Dongfang Liu
ViT
289
64
0
23 Apr 2023
Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health
Shaoxiong Ji
Tianlin Zhang
Kailai Yang
Sophia Ananiadou
Xiaoshi Zhong
Jörg Tiedemann
AI4MH
ALM
195
38
0
20 Apr 2023
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Andrey Kravchenko
LRM
340
111
0
19 Apr 2023
From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation
Adarsh Kumar
Pedro Sarmento
191
4
0
18 Apr 2023
Learning to Compress Prompts with Gist Tokens
Neural Information Processing Systems (NeurIPS), 2023
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
444
294
0
17 Apr 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
290
2
0
17 Apr 2023
MisRoBÆRTa: Transformers versus Misinformation
Ciprian-Octavian Truică
Elena Simona Apostol
185
58
0
16 Apr 2023
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Ruchao Fan
Wei Chu
Peng Chang
Abeer Alwan
178
18
0
15 Apr 2023
Fairness in Visual Clustering: A Novel Transformer Clustering Approach
Xuan-Bac Nguyen
C. Duong
Marios Savvides
Kaushik Roy
Hugh Churchill
Khoa Luu
273
11
0
14 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
208
5
0
11 Apr 2023
Context-Aware Classification of Legal Document Pages
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Pavlos Fragkogiannis
Martina Forster
Grace E. Lee
Dell Zhang
154
6
0
05 Apr 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Shiyang Feng
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Jiaming Song
Yu Qiao
MLLM
590
943
0
28 Mar 2023
Planning with Sequence Models through Iterative Energy Minimization
International Conference on Learning Representations (ICLR), 2023
Hongyi Chen
Yilun Du
Yiye Chen
J. Tenenbaum
Patricio A. Vela
168
8
0
28 Mar 2023
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sara Papi
Marco Gaido
Andrea Pilzer
Matteo Negri
499
16
0
28 Mar 2023
Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
Computer Vision and Pattern Recognition (CVPR), 2023
Clinton Mo
Kun Hu
Chengjiang Long
Zhiyong Wang
165
20
0
27 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
215
160
0
25 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
223
73
0
22 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
463
70
0
21 Mar 2023
Language Model Behavior: A Comprehensive Survey
International Conference on Computational Logic (ICCL), 2023
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
381
143
0
20 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
International Conference on Machine Learning (ICML), 2023
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
223
12
0
20 Mar 2023
HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular Signals
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ella Lan
MedIm
117
3
0
17 Mar 2023
BiFormer: Vision Transformer with Bi-Level Routing Attention
Computer Vision and Pattern Recognition (CVPR), 2023
Lei Zhu
Xinjiang Wang
Zhanghan Ke
Wayne Zhang
Rynson W. H. Lau
352
846
0
15 Mar 2023
PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining
Conference on Robot Learning (CoRL), 2023
G. Thomas
Ching-An Cheng
Ricky Loynd
Felipe Vieira Frujeri
Vibhav Vineet
Mihai Jalobeanu
Andrey Kolobov
SSL
288
13
0
15 Mar 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yongil Kim
Yerin Hwang
Hyeongu Yun
Seunghyun Yoon
Trung Bui
Kyomin Jung
278
7
0
15 Mar 2023
AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+
Tianlin Li
Ying Wang
Ziwei Xuan
Guo-Jun Qi
ViT
178
4
0
14 Mar 2023
Transformer Models for Acute Brain Dysfunction Prediction
B. Silva
Miguel Contreras
T. Ozrazgat-Baslanti
Yuanfang Ren
Ziyuan Guan
Kia Khezeli
A. Bihorac
Parisa Rashidi
132
0
0
13 Mar 2023
Transformer-based World Models Are Happy With 100k Interactions
International Conference on Learning Representations (ICLR), 2023
Jan Robine
Marc Höftmann
Tobias Uelwer
Stefan Harmeling
OffRL
283
124
0
13 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
APSIPA Transactions on Signal and Information Processing (TASIP), 2023
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
283
53
0
10 Mar 2023
Diffusing Gaussian Mixtures for Generating Categorical Data
AAAI Conference on Artificial Intelligence (AAAI), 2023
Florence Regol
Mark Coates
DiffM
173
6
0
08 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
277
727
0
07 Mar 2023
A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course Summarization
Machine Learning in Health Care (MLHC), 2023
Griffin Adams
Jason Zucker
Noémie Elhadad
191
26
0
07 Mar 2023
CLIP-guided Prototype Modulating for Few-shot Action Recognition
International Journal of Computer Vision (IJCV), 2023
Xiang Wang
Shiwei Zhang
Jun Cen
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
VLM
227
81
0
06 Mar 2023
GlobalNER: Incorporating Non-local Information into Named Entity Recognition
Chiao-Wei Hsu
Keh-Yih Su
NAI
149
0
0
06 Mar 2023
LooperGP: A Loopable Sequence Model for Live Coding Performance using GuitarPro Tablature
Sara Adkins
Pedro Sarmento
M. Barthet
161
10
0
03 Mar 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
302
248
0
03 Mar 2023
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
International Conference on Learning Representations (ICLR), 2023
Tianlong Chen
Zhenyu Zhang
Ajay Jaiswal
Shiwei Liu
Zinan Lin
MoE
277
70
0
02 Mar 2023
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaoyu Shi
Zhaoyang Huang
Dasong Li
Manyuan Zhang
Ka Chun Cheung
Simon See
Hongwei Qin
Jifeng Dai
Jiaming Song
238
130
0
02 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
410
348
0
02 Mar 2023
Previous
1
2
3
...
15
16
17
...
39
40
41
Next
Page 16 of 41
Page
of 41
Go