Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Interspeech (Interspeech), 2023
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
236
12
0
12 Jul 2023
Pluggable Neural Machine Translation Models via Memory-augmented Adapters
International Conference on Language Resources and Evaluation (LREC), 2023
Yuzhuang Xu
Shuo Wang
Peng Li
Xuebo Liu
Xiaolong Wang
Weidong Liu
Yang Liu
342
1
0
12 Jul 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
252
26
0
12 Jul 2023
ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production
John H. Loth
Pedro Sarmento
CJ Carr
Zack Zukowski
M. Barthet
158
10
0
11 Jul 2023
ShredGP: Guitarist Style-Conditioned Tablature Generation
Pedro Sarmento
Adarsh Kumar
Dekun Xie
CJ Carr
Zack Zukowski
M. Barthet
183
9
0
11 Jul 2023
Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
International Conference on Learning Representations (ICLR), 2023
Zhun Yang
Adam Ishay
Joohyung Lee
341
17
0
10 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Transactions of the Association for Computational Linguistics (TACL), 2023
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Abigail Z. Jacobs
RALM
557
2,692
0
06 Jul 2023
Focused Transformer: Contrastive Training for Context Scaling
Neural Information Processing Systems (NeurIPS), 2023
Szymon Tworkowski
Konrad Staniszewski
Mikolaj Pacek
Yuhuai Wu
Henryk Michalewski
Piotr Milo's
235
165
0
06 Jul 2023
LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias
Knowledge Discovery and Data Mining (KDD), 2023
Mario Almagro
Emilio Almazán
Diego Ortego
David Jiménez
255
5
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
473
221
0
05 Jul 2023
Facing Off World Model Backbones: RNNs, Transformers, and S4
Neural Information Processing Systems (NeurIPS), 2023
Fei Deng
Junyeong Park
Sungjin Ahn
340
43
0
05 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2023
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Tengjiao Wang
313
21
0
05 Jul 2023
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang
Hiromitsu Nishizaki
Ming Li
215
1
0
04 Jul 2023
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Matthew Raffel
Lizhong Chen
158
5
0
03 Jul 2023
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
International Conference on Machine Learning (ICML), 2023
Matthew Raffel
Drew Penney
Lizhong Chen
162
4
0
03 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Interspeech (Interspeech), 2023
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
183
9
0
03 Jul 2023
MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problems
Journal of the mechanics and physics of solids (JMPS), 2023
Markus J. Buehler
AI4CE
223
59
0
30 Jun 2023
Knowledge Base Completion for Long-Tail Entities
Lihu Chen
Simon Razniewski
Gerhard Weikum
KELM
223
9
0
30 Jun 2023
Leveraging Cross-Utterance Context For ASR Decoding
Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
194
1
0
29 Jun 2023
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Automatic Speech Recognition & Understanding (ASRU), 2023
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
256
62
0
28 Jun 2023
Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio
Conference on Algebraic Informatics (CAI), 2023
Allen Roush
Sanjay Basu
Akshay Moorthy
Dmitry Dubovoy
123
12
0
28 Jun 2023
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
436
684
0
27 Jun 2023
Length Generalization in Arithmetic Transformers
Samy Jelassi
Stéphane dÁscoli
Carles Domingo-Enrich
Yuhuai Wu
Yuan-Fang Li
Franccois Charton
258
52
0
27 Jun 2023
MotionGPT: Human Motion as a Foreign Language
Neural Information Processing Systems (NeurIPS), 2023
Biao Jiang
Xin Chen
Wen Liu
Jingyi Yu
Gang Yu
Tao Chen
MLLM
292
450
0
26 Jun 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
442
11
0
26 Jun 2023
Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window
Jinkyu Koo
John Yang
Le An
Gwenaelle Cunha Sergio
Su Inn Park
ViT
106
0
0
23 Jun 2023
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
209
0
0
23 Jun 2023
Long-range Language Modeling with Self-retrieval
Transactions of the Association for Computational Linguistics (TACL), 2023
Ohad Rubin
Jonathan Berant
RALM
KELM
229
32
0
23 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Interspeech (Interspeech), 2023
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
174
8
0
23 Jun 2023
Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models
Neural Information Processing Systems (NeurIPS), 2023
Leonardo Galli
Holger Rauhut
Mark Schmidt
211
17
0
22 Jun 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
169
6
0
21 Jun 2023
Sparse Modular Activation for Efficient Sequence Modeling
Neural Information Processing Systems (NeurIPS), 2023
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
278
17
0
19 Jun 2023
NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
Neural Information Processing Systems (NeurIPS), 2023
Yun Yi
Haokui Zhang
Rong Xiao
Nan Wang
Xiaoyu Wang
GNN
322
6
0
19 Jun 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
413
281
0
16 Jun 2023
Pushing the Limits of ChatGPT on NLP Tasks
Xiaofei Sun
Linfeng Dong
Xiaoya Li
Zhen Wan
Shuhe Wang
...
Jiwei Li
Fei Cheng
Lingjuan Lyu
Leilei Gan
Guoyin Wang
AI4MH
LRM
294
37
0
16 Jun 2023
TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling
Ke Deng
Zhiyuan He
Haotong Zhang
Hao-Wei Lin
Desheng Wang
75
0
0
16 Jun 2023
Block-State Transformers
Neural Information Processing Systems (NeurIPS), 2023
Mahan Fathi
Jonathan Pilault
Orhan Firat
C. Pal
Pierre-Luc Bacon
Ross Goroshin
253
25
0
15 Jun 2023
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
397
13
0
15 Jun 2023
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
Fabian Paischer
Thomas Adler
M. Hofmarcher
Sepp Hochreiter
300
18
0
15 Jun 2023
Iterative self-transfer learning: A general methodology for response time-history prediction based on small dataset
Journal of Computational Design and Engineering (JCDE), 2023
Yongjia Xu
Xinzheng Lu
Yifan Fei
Yuli Huang
AI4TS
AI4CE
166
19
0
14 Jun 2023
Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time Series
Knowledge Discovery and Data Mining (KDD), 2023
Jiawen Zhang
Shun Zheng
Wei Cao
Jiang Bian
Jia Li
AI4TS
158
47
0
14 Jun 2023
Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Weidong Ji
Shijie Zan
Guohui Zhou
Xu Wang
SyDa
189
1
0
14 Jun 2023
Augmenting Language Models with Long-Term Memory
Neural Information Processing Systems (NeurIPS), 2023
Weizhi Wang
Li Dong
Hao Cheng
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
KELM
RALM
241
142
0
12 Jun 2023
Recurrent Attention Networks for Long-text Modeling
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xianming Li
Zongxi Li
Xiaotian Luo
Haoran Xie
Xing Lee
Yingbin Zhao
Fu Lee Wang
Qing Li
RALM
217
21
0
12 Jun 2023
A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text
Natural Language Processing Journal (JNLP), 2023
Jessica Nayeli López Espejel
Mahaman Sanoussi Yahaya Alassan
El Mehdi Chouham
Walid Dahhane
E. Ettifouri
269
15
0
10 Jun 2023
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Zhaoyang Huang
Xiaoyu Shi
Chao Zhang
Qiang Wang
Yijin Li
Hongwei Qin
Jifeng Dai
Xiaogang Wang
Jiaming Song
338
4
0
08 Jun 2023
ModuleFormer: Modularity Emerges from Mixture-of-Experts
Songlin Yang
Zheyu Zhang
Tianyou Cao
Shawn Tan
Zhenfang Chen
Chuang Gan
KELM
MoE
202
13
0
07 Jun 2023
Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Yuxuan Wang
Hong Lyu
147
4
0
05 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Knowledge Discovery and Data Mining (KDD), 2023
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
325
4
0
02 Jun 2023
Data-Efficient French Language Modeling with CamemBERTa
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Wissam Antoun
Benoît Sagot
Djamé Seddah
152
9
0
02 Jun 2023
Previous
1
2
3
...
13
14
15
...
39
40
41
Next
Page 14 of 41
Page
of 41
Go