Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
EpiK-Eval: Evaluation for Language Models as Epistemic Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Gabriele Prato
Jerry Huang
Prasannna Parthasarathi
Shagun Sodhani
Sarath Chandar
ELM
248
6
0
23 Oct 2023
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Apostol T. Vassilev
Honglan Jin
Munawar Hasan
264
1
0
23 Oct 2023
Retrieval-Augmented Chain-of-Thought in Semi-structured Domains
Vaibhav Mavi
Abulhair Saparov
Chen Zhao
LRM
279
10
0
22 Oct 2023
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Kun Wei
Bei Li
Hang Lv
Quan Lu
Ning Jiang
Lei Xie
389
11
0
22 Oct 2023
Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots Using Double Deep Reinforcement Learning Techniques
Linda Dotto de Moraes
V. A. Kich
A. H. Kolling
J. A. Bottega
Ricardo B. Grando
A. R. Cukla
D. T. Gamarra
160
1
0
20 Oct 2023
Multi-level Contrastive Learning for Script-based Character Understanding
Dawei Li
Hengyuan Zhang
Yanran Li
Shiping Yang
277
17
0
20 Oct 2023
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding
Zhejun Zhang
Alexander Liniger
Daniel Gehrig
Fisher Yu
Luc Van Gool
301
55
0
19 Oct 2023
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models
Yi Zhou
Jose Camacho-Collados
Danushka Bollegala
436
7
0
19 Oct 2023
The Locality and Symmetry of Positional Encodings
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
185
1
0
19 Oct 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Qingru Zhang
Dhananjay Ram
Cole Hawkins
Sheng Zha
Tuo Zhao
278
22
0
19 Oct 2023
From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers
Shaoxiong Duan
Yining Shi
Wei Xu
281
16
0
18 Oct 2023
Long-form Simultaneous Speech Translation: Thesis Proposal
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Peter Polák
3DV
208
3
0
17 Oct 2023
Heterogenous Memory Augmented Neural Networks
Zihan Qiu
Zhen Liu
Shuicheng Yan
Shanghang Zhang
Jie Fu
205
0
0
17 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
420
23
0
16 Oct 2023
A Survey on Video Diffusion Models
ACM Computing Surveys (ACM Comput. Surv.), 2023
Zhen Xing
Qijun Feng
Haoran Chen
Jingdong Sun
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
457
220
0
16 Oct 2023
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
Neural Information Processing Systems (NeurIPS), 2023
Huayang Li
Tian Lan
Z. Fu
Deng Cai
Lemao Liu
Nigel Collier
Taro Watanabe
Yixuan Su
206
28
0
16 Oct 2023
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation
Yingwei Ma
Yue Yu
Shanshan Li
Yu Jiang
Yong Guo
Yuanliang Zhang
Yutao Xie
Xiangke Liao
184
13
0
16 Oct 2023
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
Thomas Jiralerspong
Flemming Kondrup
Doina Precup
Khimya Khetarpal
144
0
0
16 Oct 2023
CoCoFormer: A controllable feature-rich polyphonic music generation method
Jiuyang Zhou
Tengfei Niu
Hong Zhu
Xingping Wang
235
0
0
15 Oct 2023
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
Weipu Zhang
Gang Wang
Jian Sun
Yetian Yuan
Gao Huang
239
95
0
14 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLM
VLM
409
70
0
13 Oct 2023
MemGPT: Towards LLMs as Operating Systems
Charles Packer
Sarah Wooders
Kevin Lin
Vivian Fang
Shishir G. Patil
Ion Stoica
Alfons Kemper
RALM
1.7K
333
0
12 Oct 2023
Cross-Episodic Curriculum for Transformer Agents
Neural Information Processing Systems (NeurIPS), 2023
Lucy Xiaoyang Shi
Yunfan Jiang
Jake Grigsby
Linxi "Jim" Fan
Yuke Zhu
167
9
0
12 Oct 2023
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
International Conference on Learning Representations (ICLR), 2023
Shaofei Cai
Bowei Zhang
Zihao Wang
Xiaojian Ma
Hoang Trung-Dung
Yitao Liang
322
38
0
12 Oct 2023
DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Neural Information Processing Systems (NeurIPS), 2023
Qingkai Fang
Yan Zhou
Yangzhou Feng
210
16
0
11 Oct 2023
Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning
Workshop on Argument Mining (ArgMining), 2023
Arushi Sharma
Abhibha Gupta
Maneesh Bilalpur
182
7
0
11 Oct 2023
Humans and language models diverge when predicting repeating text
Conference on Computational Natural Language Learning (CoNLL), 2023
Aditya R. Vaidya
Javier S. Turek
Alexander G. Huth
247
10
0
10 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
334
116
0
08 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
248
14
0
07 Oct 2023
Higher-Order DeepTrails: Unified Approach to *Trails
Lernen, Wissen, Daten, Analysen (LWA), 2023
Tobias Koopmann
Jan Pfister
André Markus
Astrid Carolus
Carolin Wienrich
Andreas Hotho
AI4TS
78
0
0
06 Oct 2023
Investigating Alternative Feature Extraction Pipelines For Clinical Note Phenotyping
Daniel Neil
121
0
0
05 Oct 2023
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks
Victor Vadakechirayath George
OffRL
157
0
0
04 Oct 2023
Retrieval meets Long Context Large Language Models
International Conference on Learning Representations (ICLR), 2023
Peng Xu
Ming-Yu Liu
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
Mohammad Shoeybi
Bryan Catanzaro
RALM
LRM
458
112
0
04 Oct 2023
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture
International Conference on Machine Learning (ICML), 2023
Sangjun Park
Jinyeong Bak
CLL
288
6
0
04 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yiming Wang
Jinyu Li
207
11
0
03 Oct 2023
Dodo: Dynamic Contextual Compression for Decoder-only LMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Guanghui Qin
Corby Rosset
Ethan C. Chau
Nikhil Rao
Benjamin Van Durme
198
17
0
03 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
International Conference on Learning Representations (ICLR), 2023
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
758
340
0
03 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus
Rickard Brannvall
Andrei Stoian
170
0
0
03 Oct 2023
A Framework for Inference Inspired by Human Memory Mechanisms
International Conference on Learning Representations (ICLR), 2023
Xiangyu Zeng
Jie Lin
Piao Hu
Ruizheng Huang
Zhicheng Zhang
192
4
0
01 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Helen Zhou
157
15
0
01 Oct 2023
Self-Supervised Open-Ended Classification with Small Visual Language Models
Mohammad Mahdi Derakhshani
Ivona Najdenkoska
Cees G. M. Snoek
M. Worring
Yuki M. Asano
VLM
416
0
0
30 Sep 2023
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
Interspeech (Interspeech), 2023
Weiran Wang
Zelin Wu
D. Caseiro
Tsendsuren Munkhdalai
K. Sim
...
Rohit Prabhavalkar
Zhong Meng
Ding Zhao
Tara N. Sainath
P. M. Mengibar
245
11
0
29 Sep 2023
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Marco Pleines
Matthias Pallasch
Frank Zimmer
Mike Preuss
OffRL
320
10
0
29 Sep 2023
LatticeGen: A Cooperative Framework which Hides Generated Text in a Lattice for Privacy-Aware Generation on Cloud
Mengke Zhang
Tianxing He
Tianle Wang
Lu Mi
Fatemehsadat Mireshghallah
Binyi Chen
Hao Wang
Yulia Tsvetkov
225
2
0
29 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
211
22
0
28 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
273
23
0
28 Sep 2023
Unsupervised Pretraining for Fact Verification by Language Model Distillation
International Conference on Learning Representations (ICLR), 2023
A. Bazaga
Pietro Lio
Bo Dai
HILM
351
5
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
International Conference on Learning Representations (ICLR), 2023
Albert Mohwald
250
26
0
28 Sep 2023
At Which Training Stage Does Code Data Help LLMs Reasoning?
International Conference on Learning Representations (ICLR), 2023
Xiaogang Jia
Yue Liu
Yue Yu
Yuanliang Zhang
Yu Jiang
Changjian Wang
Shanshan Li
LRM
SyDa
363
90
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRM
RALM
323
82
0
28 Sep 2023
Previous
1
2
3
...
11
12
13
...
39
40
41
Next
Page 12 of 41
Page
of 41
Go