Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.04768
Cited By
Linformer: Self-Attention with Linear Complexity
8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linformer: Self-Attention with Linear Complexity"
50 / 648 papers shown
Title
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
23
0
0
11 Oct 2023
Accelerating Vision Transformers Based on Heterogeneous Attention Patterns
Deli Yu
Teng Xi
Jianwei Li
Baopu Li
Gang Zhang
Haocheng Feng
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
ViT
26
0
0
11 Oct 2023
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu
Hang Yan
Shuo Zhang
Chen An
Xipeng Qiu
Dahua Lin
23
6
0
08 Oct 2023
Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn
Sanhita Pathak
V. Kaushik
Brejesh Lall
DiffM
23
2
0
08 Oct 2023
PriViT: Vision Transformers for Fast Private Inference
Naren Dhyani
Jianqiao Mo
Minsu Cho
Ameya Joshi
Siddharth Garg
Brandon Reagen
Chinmay Hegde
20
4
0
06 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao-quan Song
24
31
0
06 Oct 2023
Retrieval meets Long Context Large Language Models
Peng-Tao Xu
Wei Ping
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
M. Shoeybi
Bryan Catanzaro
RALM
LRM
14
79
0
04 Oct 2023
Memory-efficient particle filter recurrent neural network for object localization
Roman Korkin
Ivan V. Oseledets
Aleksandr Katrutsa
16
1
0
02 Oct 2023
SeisT: A foundational deep learning model for earthquake monitoring tasks
Sen Li
Xu Yang
Anye Cao
Changbin Wang
Yaoqi Liu
Yapeng Liu
Qiang Niu
28
3
0
02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
32
2
0
01 Oct 2023
Efficient Streaming Language Models with Attention Sinks
Michel Lang
Yuandong Tian
Beidi Chen
Song Han
Mike Lewis
AI4TS
RALM
25
639
0
29 Sep 2023
A Survey on Deep Learning Techniques for Action Anticipation
Zeyun Zhong
Manuel Martin
Michael Voit
Juergen Gall
Jürgen Beyerer
24
7
0
29 Sep 2023
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
9
4
0
29 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
26
15
0
28 Sep 2023
Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words
Yu Bao
Srinivasan Sivanandan
Theofanis Karaletsos
ViT
19
22
0
28 Sep 2023
Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Zihan Liu
Zewei Sun
Shanbo Cheng
Shujian Huang
Mingxuan Wang
18
1
0
25 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
23
34
0
23 Sep 2023
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling
Bokyeong Yoon
Yoonsang Han
Gordon Euhyun Moon
22
0
0
22 Sep 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia
35
151
0
21 Sep 2023
Boolformer: Symbolic Regression of Logic Functions with Transformers
Stéphane dÁscoli
Samy Bengio
Josh Susskind
Emmanuel Abbe
19
5
0
21 Sep 2023
Interpret Vision Transformers as ConvNets with Dynamic Convolutions
Chong Zhou
Chen Change Loy
Bo Dai
ViT
25
1
0
19 Sep 2023
MUSTANG: Multi-Stain Self-Attention Graph Multiple Instance Learning Pipeline for Histopathology Whole Slide Images
Amaya Gallagher-Syed
Luca Rossi
F. Rivellese
C. Pitzalis
M. Lewis
Michael Barnes
Gregory Slabaugh
22
0
0
19 Sep 2023
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Yang Li
Liangzhen Lai
Shangguan Yuan
Forrest N. Iandola
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
26
2
0
14 Sep 2023
Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors
J. Pata
Eric Wulff
Farouk Mokhtar
D. Southwick
Mengke Zhang
M. Girone
Javier Duarte
22
1
0
13 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
22
53
0
11 Sep 2023
Long-Range Transformer Architectures for Document Understanding
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
VLM
16
2
0
11 Sep 2023
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning
Sungjun Cho
Seunghyuk Cho
Sungwoo Park
Hankook Lee
Ho Hin Lee
Moontae Lee
22
6
0
08 Sep 2023
Compressing Vision Transformers for Low-Resource Visual Learning
Eric Youn
J. SaiMitheran
Sanjana Prabhu
Siyuan Chen
ViT
21
2
0
05 Sep 2023
Language Models for Novelty Detection in System Call Traces
Quentin Fournier
Daniel Aloise
Leandro R. Costa
AI4TS
22
4
0
05 Sep 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
26
492
0
28 Aug 2023
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing
Yuwei Qiu
Kaihao Zhang
Chenxi Wang
Wenhan Luo
Hongdong Li
Zhi Jin
ViT
29
83
0
27 Aug 2023
Text Matching Improves Sequential Recommendation by Reducing Popularity Biases
Zhenghao Liu
Senkun Mei
Chenyan Xiong
Xiaohua Li
Shi Yu
Zhiyuan Liu
Yu Gu
Ge Yu
25
20
0
27 Aug 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson
Yin Li
M. Gupta
ViT
30
8
0
25 Aug 2023
Transforming the Output of Generative Pre-trained Transformer: The Influence of the PGI Framework on Attention Dynamics
Aline Ioste
19
1
0
25 Aug 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
32
7
0
25 Aug 2023
Easy attention: A simple attention mechanism for temporal predictions with transformers
Marcial Sanchis-Agudo
Yuning Wang
Roger Arnau
L. Guastoni
Jasmin Lim
Karthik Duraisamy
Ricardo Vinuesa
AI4TS
9
0
0
24 Aug 2023
Enhancing Graph Transformers with Hierarchical Distance Structural Encoding
Yuan Luo
Hongkang Li
Lei Shi
Xiao-Ming Wu
23
7
0
22 Aug 2023
A Lightweight Transformer for Faster and Robust EBSD Data Collection
Harry Dong
S. Donegan
M. Shah
Yuejie Chi
24
2
0
18 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
37
3
0
18 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
54
37
0
15 Aug 2023
Optimizing a Transformer-based network for a deep learning seismic processing workflow
R. Harsuko
T. Alkhalifah
25
9
0
09 Aug 2023
Sparse Binary Transformers for Multivariate Time Series Modeling
Matt Gorbett
Hossein Shirazi
I. Ray
AI4TS
25
13
0
09 Aug 2023
RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling
Herman Sugiharto
Aradea
H. Mubarok
14
0
0
07 Aug 2023
ConvFormer: Revisiting Transformer for Sequential User Modeling
Hao Wang
Jianxun Lian
M. Wu
Haoxuan Li
Jiajun Fan
Wanyue Xu
Chaozhuo Li
Xing Xie
17
3
0
05 Aug 2023
DeDrift: Robust Similarity Search under Content Drift
Dmitry Baranchuk
Matthijs Douze
Yash Upadhyay
I. Z. Yalniz
22
8
0
05 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment
Kun Yuan
Zishang Kong
Chuanchuan Zheng
Ming-Ting Sun
Xingsen Wen
ViT
27
14
0
31 Jul 2023
RGB-D-Fusion: Image Conditioned Depth Diffusion of Humanoid Subjects
Sascha Kirch
Valeria Olyunina
Jan Ondřej
Rafael Pagés
Sergio Martín
Clara Pérez-Molina
12
2
0
29 Jul 2023
Improving Social Media Popularity Prediction with Multiple Post Dependencies
Zhizhen Zhang
Xiao-Zhu Xie
Meng Yang
Ye Tian
Yong-jia Jiang
Yong Cui
21
5
0
28 Jul 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
30
1,129
0
17 Jul 2023
Previous
1
2
3
...
7
8
9
...
11
12
13
Next