Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.02860
Cited By
v1
v2
v3 (latest)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
9 January 2019
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context"
50 / 2,022 papers shown
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&Ro
AI4CE
LRM
VLM
260
281
0
13 Dec 2023
VILA: On Pre-training for Visual Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Ji Lin
Hongxu Yin
Ming-Yu Liu
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
Mohammad Shoeybi
Song Han
MLLM
VLM
625
676
0
12 Dec 2023
Why "classic" Transformers are shallow and how to make them go deep
Yueyao Yu
Yin Zhang
ViT
267
0
0
11 Dec 2023
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Aleksandar Terzić
Michael Hersche
G. Karunaratne
Zixiao Huang
Abu Sebastian
Abbas Rahimi
AI4TS
204
1
0
09 Dec 2023
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
Xiaoyun Xu
Shujian Yu
Jingzheng Wu
S. Picek
AAML
605
8
0
08 Dec 2023
Hijacking Context in Large Multi-modal Models
Joonhyun Jeong
MLLM
259
11
0
07 Dec 2023
Compressed Context Memory For Online Language Model Interaction
Jang-Hyun Kim
Junyoung Yeom
Sangdoo Yun
Hyun Oh Song
KELM
298
27
1
06 Dec 2023
LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models
BigData Congress [Services Society] (BSS), 2023
Reza Yousefi Maragheh
Chenhao Fang
Charan Chand Irugu
Parth Parikh
Jason H. D. Cho
...
Saranyan Sukumar
Malay Patel
Evren Körpeoglu
Sushant Kumar
Kannan Achan
246
18
0
01 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Neural Information Processing Systems (NeurIPS), 2023
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
189
26
0
01 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
393
33
0
01 Dec 2023
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers
LOG IN (LOG IN), 2023
Maciej Besta
Afonso Claudino Catarino
Lukas Gianinazzi
Nils Blach
Piotr Nyczyk
H. Niewiadomski
Torsten Hoefler
452
10
0
30 Nov 2023
DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
A. Timofeev
Anastasiia Fadeeva
A. Afonin
C. Musat
Andrii Maksai
335
2
0
29 Nov 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
IEEE transactions on multimedia (IEEE TMM), 2023
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
457
40
0
29 Nov 2023
RACE-IT: A Reconfigurable Analog Computing Engine for In-Memory Transformer Acceleration
Lei Zhao
Aishwarya Natarajan
Luca Buonanno
Archit Gajjar
Ron M. Roth
Sergey Serebryakov
John Moon
Jim Ignowski
Giacomo Pedretti
308
5
0
29 Nov 2023
Advancing State of the Art in Language Modeling
David Herel
Tomas Mikolov
273
1
0
28 Nov 2023
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
250
11
0
28 Nov 2023
Active Foundational Models for Fault Diagnosis of Electrical Motors
Sriram Anbalagan
GP SaiShashank
D. Agarwal
Balasubramaniam Natarajan
Babji Srinivasan
AI4CE
141
1
0
27 Nov 2023
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
309
9
0
24 Nov 2023
CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning
Design, Automation and Test in Europe (DATE), 2023
Shivam Aggarwal
Kuluhan Binici
Tulika Mitra
VLM
195
5
0
24 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
International Conference on Learning Representations (ICLR), 2023
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
441
55
0
21 Nov 2023
Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs
International Conference on Field-Programmable Logic and Applications (FPL), 2023
Shivam Aggarwal
Hans Jakob Damsgaard
Alessandro Pappalardo
Giuseppe Franco
Thomas B. Preußer
Michaela Blott
Tulika Mitra
MQ
284
8
0
21 Nov 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
367
99
0
21 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
278
6
0
21 Nov 2023
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Hanpeng Hu
Junwei Su
Juntao Zhao
Size Zheng
Yibo Zhu
Yanghua Peng
Chuan Wu
327
7
0
16 Nov 2023
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
Shivanshu Gupta
Clemens Rosenbaum
Ethan R. Elenberg
LRM
216
9
0
16 Nov 2023
Never Lost in the Middle: Improving Large Language Models via Attention Strengthening Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Junqing He
Kunhao Pan
Xiaoqun Dong
Zhuoyang Song
LiuYiBo LiuYiBo
...
Hao Wang
Qianguosun Qianguosun
Enming Zhang
Zejian Xie
Jiaxing Zhang
KELM
RALM
218
16
0
15 Nov 2023
Large Language Models are legal but they are not: Making the case for a powerful LegalLLM
Thanmay Jayakumar
Fauzan Farooqui
Luqman Farooqui
ELM
AILaw
ALM
247
25
0
15 Nov 2023
Predicting generalization performance with correctness discriminators
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuekun Yao
Alexander Koller
368
1
0
15 Nov 2023
Memory-efficient Stochastic methods for Memory-based Transformers
Vishwajit Kumar Vishnu
C. Sekhar
113
0
0
14 Nov 2023
Argumentation Element Annotation Modeling using XLNet
Christopher M. Ormerod
Amy Burkhardt
Mackenzie Young
Susan Lottridge
125
6
0
10 Nov 2023
Large Human Language Models: A Need and the Challenges
Nikita Soni
H. Andrew Schwartz
João Sedoc
Niranjan Balasubramanian
ALM
AI4CE
265
15
0
09 Nov 2023
CLearViD: Curriculum Learning for Video Description
Cheng-Yu Chuang
Pooyan Fazli
148
1
0
08 Nov 2023
Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
Jishnu Ray Chowdhury
Cornelia Caragea
227
5
0
08 Nov 2023
A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space
Wenchong He
Zhe Jiang
Tingsong Xiao
Zelin Xu
Shigang Chen
Ronald Fick
Miles Medina
Christine Angelini
256
18
0
08 Nov 2023
Multi-resolution Time-Series Transformer for Long-term Forecasting
Yitian Zhang
Liheng Ma
Soumyasundar Pal
Yingxue Zhang
Mark Coates
AI4TS
192
67
0
07 Nov 2023
p-Laplacian Transformer
Tuan Nguyen
Tam Nguyen
Vinh-Tiep Nguyen
Tan-Minh Nguyen
186
0
0
06 Nov 2023
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hayeon Lee
Rui Hou
Jongpil Kim
Davis Liang
Hongbo Zhang
Sung Ju Hwang
Alexander Min
346
0
0
06 Nov 2023
Sentiment Analysis through LLM Negotiations
Xiaofei Sun
Xiaoya Li
Shengyu Zhang
Shuhe Wang
Leilei Gan
Jiwei Li
Tianwei Zhang
Guoyin Wang
188
29
0
03 Nov 2023
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tao Liu
Chenpeng Du
Shuai Fan
Feilong Chen
Kai Yu
DiffM
VGen
296
15
0
03 Nov 2023
FlashDecoding++: Faster Large Language Model Inference on GPUs
Ke Hong
Guohao Dai
Jiaming Xu
Qiuli Mao
Xiuhong Li
Jun Liu
Kangdi Chen
Yuhan Dong
Yu Wang
554
93
0
02 Nov 2023
Task-Agnostic Low-Rank Adapters for Unseen English Dialects
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zedian Xiao
William B. Held
Yanchen Liu
Diyi Yang
258
11
0
02 Nov 2023
Network Contention-Aware Cluster Scheduling with Reinforcement Learning
International Conference on Parallel and Distributed Systems (ICPADS), 2023
Junyeol Ryu
Jeongyoon Eo
GNN
98
2
0
31 Oct 2023
ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout
Huiyao Shu
Ang Wang
Ziji Shi
Hanyu Zhao
Yong Li
Lu Lu
OffRL
151
3
0
30 Oct 2023
Stacking the Odds: Transformer-Based Ensemble for AI-Generated Text Detection
Australasian Language Technology Association Workshop (ALTA), 2023
Duke Nguyen
Khaing Myat Noe Naing
Aditya Joshi
212
6
0
29 Oct 2023
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
424
18
0
28 Oct 2023
Transformers as Graph-to-Graph Models
James Henderson
Alireza Mohammadshahi
Andrei Catalin Coman
Lesly Miculicich
GNN
198
7
0
27 Oct 2023
Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks
Shen Yuan
Hongteng Xu
166
3
0
26 Oct 2023
CLEX: Continuous Length Extrapolation for Large Language Models
International Conference on Learning Representations (ICLR), 2023
Guanzheng Chen
Xin Li
Zaiqiao Meng
Shangsong Liang
Li Bing
268
36
0
25 Oct 2023
How Much Context Does My Attention-Based ASR System Need?
Interspeech (Interspeech), 2023
Robert Flynn
Anton Ragni
238
2
0
24 Oct 2023
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haofei Yu
Cunxiang Wang
Yue Zhang
Wei Bi
RALM
295
5
0
24 Oct 2023
Previous
1
2
3
...
10
11
12
...
39
40
41
Next