Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.16354
Cited By
Transformer-VQ: Linear-Time Transformers via Vector Quantization
28 September 2023
Albert Mohwald
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer-VQ: Linear-Time Transformers via Vector Quantization"
18 / 18 papers shown
Title
Multi-Sense Embeddings for Language Models and Knowledge Distillation
Qitong Wang
Mohammed J. Zaki
Georgios Kollias
Vasileios Kalantzis
KELM
26
0
0
08 Apr 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Jinwei Gu
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
47
0
0
21 Jan 2025
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
23
0
0
31 Oct 2024
Residual vector quantization for KV cache compression in large language model
Ankur Kumar
MQ
24
0
0
21 Oct 2024
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
37
12
0
22 Aug 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
20
7
0
12 Jun 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
19
4
0
17 Apr 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
33
30
0
15 Feb 2024
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
Yichen Jiang
Xiang Zhou
Mohit Bansal
20
1
0
09 Feb 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
34
138
0
11 Dec 2023
Autoregressive Image Generation using Residual Quantization
Doyup Lee
Chiheon Kim
Saehoon Kim
Minsu Cho
Wook-Shin Han
VGen
165
324
0
03 Mar 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
220
0
21 Feb 2022
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
95
41
0
25 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
73
66
0
12 Jul 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
228
502
0
12 Mar 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
219
2,391
0
25 Jan 2016
1