ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.06821
  4. Cited By
Leveraging redundancy in attention with Reuse Transformers

Leveraging redundancy in attention with Reuse Transformers

13 October 2021
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
ArXiv (abs)PDFHTML

Papers citing "Leveraging redundancy in attention with Reuse Transformers"

21 / 21 papers shown
Title
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Jiashun Cheng
Aochuan Chen
Nuo Chen
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
64
0
0
20 Jun 2025
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
Di He
Ajay Jaiswal
Songjun Tu
Li Shen
Ganzhao Yuan
Shiwei Liu
L. Yin
118
0
0
17 Jun 2025
Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement
Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement
Yuxin Ren
Maxwell D Collins
Miao Hu
Huanrui Yang
136
0
0
24 May 2025
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
Jiancheng Zhao
Xingda Yu
Zhen Yang
MoE
134
3
0
27 Mar 2025
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved
  Layer-wise Pruning of Large Language Models
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
84
13
0
14 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model
  Compression
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
117
9
0
02 Oct 2024
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language
  Models
EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Hossein Rajabzadeh
A. Jafari
Aman Sharma
Benyamin Jami
Hyock Ju Kwon
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
69
0
0
22 Sep 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
108
0
0
30 Apr 2024
MLP Can Be A Good Transformer Learner
MLP Can Be A Good Transformer Learner
Sihao Lin
Pumeng Lyu
Dongrui Liu
Tao Tang
Xiaodan Liang
Andy Song
Xiaojun Chang
ViT
125
14
0
08 Apr 2024
Understanding Neural Network Binarization with Forward and Backward
  Proximal Quantizers
Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
Yiwei Lu
Yaoliang Yu
Xinlin Li
Vahid Partovi Nia
MQ
121
5
0
27 Feb 2024
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Omkar Thawakar
Ashmal Vayani
Salman Khan
Hisham Cholakal
Rao M. Anwer
Michael Felsberg
Timothy Baldwin
Eric P. Xing
Fahad Shahbaz Khan
151
42
0
26 Feb 2024
Head-wise Shareable Attention for Large Language Models
Head-wise Shareable Attention for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
92
4
0
19 Feb 2024
Dynamic Layer Tying for Parameter-Efficient Transformers
Dynamic Layer Tying for Parameter-Efficient Transformers
Tamir David Hay
Lior Wolf
98
5
0
23 Jan 2024
Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models
Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models
Rosco Hunter
Łukasz Dudziak
Mohamed S. Abdelfattah
Abhinav Mehrotra
Sourav Bhattacharya
Hongkai Wen
199
1
0
13 Dec 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
213
114
0
08 Oct 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
126
7
0
19 May 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
91
3
0
23 Jan 2023
Husformer: A Multi-Modal Transformer for Multi-Modal Human State
  Recognition
Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition
Ruiqi Wang
Wonse Jo
Dezhong Zhao
Weizheng Wang
B. Yang
Guohua Chen
Byung-Cheol Min
HAI
112
33
0
30 Sep 2022
On The Computational Complexity of Self-Attention
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
187
164
0
11 Sep 2022
Visualizing and Understanding Patch Interactions in Vision Transformer
Visualizing and Understanding Patch Interactions in Vision Transformer
Jie Ma
Yalong Bai
Bineng Zhong
Wei Zhang
Ting Yao
Tao Mei
ViT
76
41
0
11 Mar 2022
Q-ViT: Fully Differentiable Quantization for Vision Transformer
Q-ViT: Fully Differentiable Quantization for Vision Transformer
Zhexin Li
Tong Yang
Peisong Wang
Jian Cheng
ViTMQ
115
44
0
19 Jan 2022
1