Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.04768
Cited By
Linformer: Self-Attention with Linear Complexity
8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linformer: Self-Attention with Linear Complexity"
50 / 648 papers shown
Title
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
24
11
0
13 Jan 2024
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Jiaheng Liu
Zhiqi Bai
Yuanxing Zhang
Chenchen Zhang
Yu Zhang
...
Wenbo Su
Tiezheng Ge
Jie Fu
Wenhu Chen
Bo Zheng
43
8
0
13 Jan 2024
Transformers are Multi-State RNNs
Matanel Oren
Michael Hassid
Nir Yarden
Yossi Adi
Roy Schwartz
OffRL
24
34
0
11 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
29
0
0
11 Jan 2024
Efficient Image Deblurring Networks based on Diffusion Models
Kang Chen
Yuanjie Liu
DiffM
14
1
0
11 Jan 2024
Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker
Jingtao Sun
Yaonan Wang
Danwei Wang
23
1
0
09 Jan 2024
SeTformer is What You Need for Vision and Language
Pourya Shamsolmoali
Masoumeh Zareapoor
Eric Granger
Michael Felsberg
28
4
0
07 Jan 2024
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He
Ruihuang Li
Guowen Zhang
Lei Zhang
22
4
0
01 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
61
76
0
23 Dec 2023
Sign Language Production with Latent Motion Transformer
Pan Xie
Taiying Peng
Yao Du
Qipeng Zhang
SLR
19
3
0
20 Dec 2023
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Zhaoyang Zhang
Wenqi Shao
Yixiao Ge
Xiaogang Wang
Jinwei Gu
Ping Luo
8
2
0
20 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
19
1
0
18 Dec 2023
Linear Attention via Orthogonal Memory
Jun Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
34
3
0
18 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
29
7
0
14 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
25
4
0
07 Dec 2023
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem
Yingqiang Ge
Yujie Ren
Wenyue Hua
Shuyuan Xu
Juntao Tan
Yongfeng Zhang
LLMAG
17
27
0
06 Dec 2023
DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh
Jiaming Song
Guilin Liu
Jan Kautz
Arash Vahdat
24
66
0
04 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
33
0
0
04 Dec 2023
ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation
Tong Nie
Guoyang Qin
Wei Ma
Yuewen Mei
Jiangming Sun
AI4TS
AI4CE
24
22
0
04 Dec 2023
Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series Forecasting Approach
Jinguo Cheng
Ke Li
Yuxuan Liang
Lijun Sun
Junchi Yan
Yuankai Wu
AI4TS
25
2
0
04 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
23
29
0
02 Dec 2023
Diffusion Models Without Attention
Jing Nathan Yan
Jiatao Gu
Alexander M. Rush
22
60
0
30 Nov 2023
QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks
Chenhui Xu
Fuxun Yu
Zirui Xu
Chenchen Liu
Jinjun Xiong
Xiang Chen
30
4
0
29 Nov 2023
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
27
7
0
28 Nov 2023
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space
Raghav Addanki
Chenyang Li
Zhao-quan Song
Chiwun Yang
42
3
0
24 Nov 2023
Linear Log-Normal Attention with Unbiased Concentration
Yury Nahshan
Dor-Joseph Kampeas
E. Haleva
22
7
0
22 Nov 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
28
54
0
21 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
27
4
0
21 Nov 2023
Zero redundancy distributed learning with differential privacy
Zhiqi Bu
Justin Chiu
Ruixuan Liu
Sheng Zha
George Karypis
38
8
0
20 Nov 2023
LATIS: Lambda Abstraction-based Thermal Image Super-resolution
Gargi Panda
Soumitra Kundu
Saumik Bhattacharya
Aurobinda Routray
15
0
0
18 Nov 2023
Sparse Attention-Based Neural Networks for Code Classification
Ziyang Xiang
Zaixin Zhang
Qi Liu
10
0
0
11 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
36
29
0
10 Nov 2023
Window Attention is Bugged: How not to Interpolate Position Embeddings
Daniel Bolya
Chaitanya K. Ryali
Judy Hoffman
Christoph Feichtenhofer
43
10
0
09 Nov 2023
Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform
Daniele Giofré
Sneha Ghantasala
AILaw
29
0
0
09 Nov 2023
SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers
Xiangyong Lu
Masanori Suganuma
Takayuki Okatani
25
10
0
07 Nov 2023
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
Farnoosh Javadi
Walid Ahmed
Habib Hajimolahoseini
Foozhan Ataiefard
Mohammad Hassanpour
Saina Asani
Austin Wen
Omar Mohamed Awad
Kangling Liu
Yang Liu
VLM
34
7
0
06 Nov 2023
The Expressibility of Polynomial based Attention Scheme
Zhao-quan Song
Guangyi Xu
Junze Yin
27
5
0
30 Oct 2023
ViR: Towards Efficient Vision Retention Backbones
Ali Hatamizadeh
Michael Ranzinger
Shiyi Lan
Jose M. Alvarez
Sanja Fidler
Jan Kautz
GNN
22
1
0
30 Oct 2023
Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks
Shen Yuan
Hongteng Xu
16
0
0
26 Oct 2023
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Haofei Yu
Cunxiang Wang
Yue Zhang
Wei Bi
RALM
41
6
0
24 Oct 2023
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Apostol T. Vassilev
Honglan Jin
Munawar Hasan
6
0
0
23 Oct 2023
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
Ayan Sengupta
Md. Shad Akhtar
Tanmoy Chakraborty
18
0
0
22 Oct 2023
Learning to (Learn at Test Time)
Yu Sun
Xinhao Li
Karan Dalal
Chloe Hsu
Oluwasanmi Koyejo
Carlos Guestrin
Xiaolong Wang
Tatsunori Hashimoto
Xinlei Chen
SSL
25
6
0
20 Oct 2023
Improved Operator Learning by Orthogonal Attention
Zipeng Xiao
Zhongkai Hao
Bokai Lin
Zhijie Deng
Hang Su
24
11
0
19 Oct 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
Qingru Zhang
Dhananjay Ram
Cole Hawkins
Sheng Zha
Tuo Zhao
27
15
0
19 Oct 2023
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Yanming Kang
Giang Tran
H. Sterck
16
3
0
18 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
21
2
0
16 Oct 2023
OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer
Junjie Gao
Qiujie Dong
Ruian Wang
Shuangmin Chen
Shiqing Xin
Changhe Tu
Wenping Wang
23
1
0
15 Oct 2023
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Mengkang Hu
Yao Mu
Xinmiao Yu
Mingyu Ding
Shiguang Wu
Wenqi Shao
Qiguang Chen
Bin Wang
Yu Qiao
Ping Luo
LLMAG
42
33
0
12 Oct 2023
MemGPT: Towards LLMs as Operating Systems
Charles Packer
Sarah Wooders
Kevin Lin
Vivian Fang
Shishir G. Patil
Ion Stoica
Joseph E. Gonzalez
RALM
29
126
0
12 Oct 2023
Previous
1
2
3
...
6
7
8
...
11
12
13
Next