Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
TMI! Finetuned Models Leak Private Information from their Pretraining Data
Proceedings on Privacy Enhancing Technologies (PoPETs), 2023
John Abascal
Stanley Wu
Alina Oprea
Jonathan R. Ullman
305
23
0
01 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Neural Information Processing Systems (NeurIPS), 2023
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
223
71
0
01 Jun 2023
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Neural Information Processing Systems (NeurIPS), 2023
Shalev Lifshitz
Keiran Paster
Harris Chan
Jimmy Ba
Sheila A. McIlraith
LM&Ro
352
99
0
01 Jun 2023
Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Workshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2023
Eda Okur
Roddy Fuentes Alba
Saurav Sahay
L. Nachman
190
1
0
01 Jun 2023
Monotonic Location Attention for Length Generalization
International Conference on Machine Learning (ICML), 2023
Jishnu Ray Chowdhury
Cornelia Caragea
LLMAG
177
11
0
31 May 2023
The Impact of Positional Encoding on Length Generalization in Transformers
Neural Information Processing Systems (NeurIPS), 2023
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan N. Ramamurthy
Payel Das
Siva Reddy
390
312
0
31 May 2023
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
277
13
0
30 May 2023
NetHack is Hard to Hack
Neural Information Processing Systems (NeurIPS), 2023
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
269
9
0
30 May 2023
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Interspeech (Interspeech), 2023
Florian Mai
Juan Pablo Zuluaga
Titouan Parcollet
P. Motlícek
160
12
0
29 May 2023
A Quantitative Review on Language Model Efficiency Research
Meng Jiang
Hy Dang
Lingbo Tong
206
0
0
28 May 2023
Graph Inductive Biases in Transformers without Message Passing
International Conference on Machine Learning (ICML), 2023
Liheng Ma
Chen Lin
Derek Lim
Adriana Romero Soriano
P. Dokania
Mark Coates
Juil Sock
Ser-Nam Lim
AI4CE
260
151
0
27 May 2023
Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing
Yara Rizk
Tahira Naseem
Ramón Fernández Astudillo
Radu Florian
Salim Roukos
172
0
0
26 May 2023
Sentence-Incremental Neural Coreference Resolution
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Matt Grenander
Shay B. Cohen
Mark Steedman
CLL
268
5
0
26 May 2023
Randomized Positional Encodings Boost Length Generalization of Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Anian Ruoss
Grégoire Delétang
Tim Genewein
Jordi Grau-Moya
Róbert Csordás
Mehdi Abbana Bennani
Shane Legg
J. Veness
LLMAG
236
128
0
26 May 2023
Landmark Attention: Random-Access Infinite Context Length for Transformers
Neural Information Processing Systems (NeurIPS), 2023
Amirkeivan Mohtashami
Martin Jaggi
LLMAG
341
197
0
25 May 2023
Passive learning of active causal strategies in agents and language models
Neural Information Processing Systems (NeurIPS), 2023
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
424
24
0
25 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shahar Lutati
Itamar Zimerman
Lior Wolf
343
11
0
24 May 2023
InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition
Interspeech (Interspeech), 2023
Zhibing Lai
Tianren Zhang
Qi Liu
Xinyuan Qian
Li-Fang Wei
Songlu Chen
Feng Chen
Xu-Cheng Yin
130
5
0
24 May 2023
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shuyang Cao
Lu Wang
250
6
0
24 May 2023
Adapting Language Models to Compress Contexts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Alexis Chevalier
Alexander Wettig
Anirudh Ajith
Danqi Chen
LLMAG
289
258
0
24 May 2023
Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yinghan Long
Sayeed Shafayet Chowdhury
Kaushik Roy
330
1
0
24 May 2023
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Li Sun
F. Luisier
Kayhan Batmanghelich
D. Florêncio
Changrong Zhang
VLM
180
7
0
23 May 2023
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Christos Baziotis
Biao Zhang
Alexandra Birch
Barry Haddow
397
2
0
23 May 2023
DAPR: A Benchmark on Document-Aware Passage Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Kexin Wang
Nils Reimers
Iryna Gurevych
368
10
0
23 May 2023
NarrativeXL: A Large-scale Dataset For Long-Term Memory Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
A. Moskvichev
Ky-Vinh Mai
RALM
212
1
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
264
114
0
22 May 2023
GNCformer Enhanced Self-attention for Automatic Speech Recognition
Junlong Li
Z. Duan
S. Li
X. Yu
G. Yang
145
1
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
326
16
0
22 May 2023
EE-TTS: Emphatic Expressive TTS with Linguistic Information
Interspeech (Interspeech), 2023
Yifan Zhong
Chen Zhang
Xule Liu
Chenxi Sun
Weishan Deng
Haifeng Hu
Zhongqian Sun
152
6
0
20 May 2023
Reducing Sequence Length by Predicting Edit Operations with Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Masahiro Kaneko
Naoaki Okazaki
242
5
0
19 May 2023
Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness
Yuxuan Zhou
Zhi-Qi Cheng
Ju He
Bin Luo
Yifeng Geng
Xuansong Xie
329
14
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Interspeech (Interspeech), 2023
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
236
23
0
18 May 2023
Deep Multiple Instance Learning with Distance-Aware Self-Attention
Georg Wolflein
Lucie Charlotte Magister
Pietro Lio
David J. Harrison
Ognjen Arandjelovic
174
4
0
17 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
Hao Chen
Jingkuan Song
Feng Zheng
ViT
154
1
0
17 May 2023
Mimetic Initialization of Self-Attention Layers
International Conference on Machine Learning (ICML), 2023
Asher Trockman
J. Zico Kolter
252
43
0
16 May 2023
Machine-Made Media: Monitoring the Mobilization of Machine-Generated Articles on Misinformation and Mainstream News Websites
International Conference on Web and Social Media (ICWSM), 2023
Hans W. A. Hanley
Zakir Durumeric
DeLMO
332
60
0
16 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
350
7
0
16 May 2023
Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text
H. Khorashadizadeh
Nandana Mihindukulasooriya
Sanju Tiwari
Jinghua Groppe
Sven Groppe
156
34
0
15 May 2023
Text Classification via Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiaofei Sun
Xiaoya Li
Jiwei Li
Leilei Gan
Shangwei Guo
Tianwei Zhang
Guoyin Wang
RALM
LRM
245
227
0
15 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Neural Information Processing Systems (NeurIPS), 2023
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
301
140
0
12 May 2023
Salient Mask-Guided Vision Transformer for Fine-Grained Classification
VISIGRAPP (VISIGRAPP), 2023
Dmitry Demidov
M.H. Sharif
Aliakbar Abdurahimov
Hisham Cholakkal
Fahad Shahbaz Khan
235
13
0
11 May 2023
A General-Purpose Multilingual Document Encoder
Onur Galoglu
Robert Litschko
Goran Glavaš
214
2
0
11 May 2023
ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge Graph
International Journal on Digital Libraries (IJDL), 2023
Salomon Kabongo KABENAMUALU
Jennifer D'Souza
Sören Auer
309
24
0
10 May 2023
VTPNet for 3D deep learning on point cloud
Wei Zhou
Weiwei Jin
Qian Wang
Yifan Wang
Dekui Wang
Xingxing Hao
Yong Yu
3DPC
ViT
165
1
0
10 May 2023
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation
Conference on Machine Learning and Systems (MLSys), 2023
Le Chen
Quazi Ishtiaque Mahmud
Hung Phan
Nesreen Ahmed
Ali Jannesari
170
18
0
09 May 2023
Effects of sub-word segmentation on performance of transformer language models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jue Hou
Anisia Katinskaia
Anh Vu
R. Yangarber
346
11
0
09 May 2023
ComputeGPT: A computational chat model for numerical problems
Ryan H. Lewis
Junfeng Jiao
112
2
0
08 May 2023
Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins
Journal of Applied Physics (JAP), 2023
Markus J. Buehler
179
28
0
07 May 2023
Leveraging Synthetic Targets for Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
147
2
0
07 May 2023
Adapting Transformer Language Models for Predictive Typing in Brain-Computer Interfaces
Shijia Liu
David A. Smith
35
2
0
05 May 2023
Previous
1
2
3
...
14
15
16
...
39
40
41
Next
Page 15 of 41
Page
of 41
Go