Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.06097
Cited By
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
13 September 2020
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding"
7 / 7 papers shown
Title
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
19
1
0
18 Dec 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
24
15
0
28 Sep 2023
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
39
9
0
14 Oct 2022
Efficient Long Sequence Encoding via Synchronization
Xiangyang Mou
Mo Yu
Bingsheng Yao
Lifu Huang
25
0
0
15 Mar 2022
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
27
98
0
10 May 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
2,009
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
578
0
12 Mar 2020
1