ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.08065
  4. Cited By
Efficient Transformer-based Large Scale Language Representations using
  Hardware-friendly Block Structured Pruning
v1v2v3v4 (latest)

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning

Findings (Findings), 2020
17 September 2020
Bingbing Li
Zhenglun Kong
Tianyun Zhang
Ji Li
Hao Sun
Hang Liu
Caiwen Ding
    VLM
ArXiv (abs)PDFHTML

Papers citing "Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning"

35 / 35 papers shown
TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform
TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles PlatformIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2025
Jun Liu
Zhenglun Kong
Pu Zhao
Weihao Zeng
Hao Tang
...
Wenbin Zhang
Geng Yuan
Wei Niu
Xue Lin
Yanzhi Wang
239
7
0
17 Aug 2025
RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation
RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank AdaptationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Jun Liu
Zhenglun Kong
Zhaoyang Han
Changdi Yang
Xuan Shen
...
Wei Niu
Wenbin Zhang
Xue Lin
Dong Huang
Yanzhi Wang
ALM
376
12
0
08 Jan 2025
Pruning Foundation Models for High Accuracy without Retraining
Pruning Foundation Models for High Accuracy without RetrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Pu Zhao
Fei Sun
Xuan Shen
Pinrui Yu
Zhenglun Kong
Yanzhi Wang
Xue Lin
277
21
0
21 Oct 2024
STAT: Shrinking Transformers After Training
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
329
4
0
29 May 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
Kaoutar El Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
456
12
0
26 May 2024
Accelerating ViT Inference on FPGA through Static and Dynamic Pruning
Accelerating ViT Inference on FPGA through Static and Dynamic Pruning
Dhruv Parikh
Shouyi Li
Bingyi Zhang
Rajgopal Kannan
Carl E. Busart
Viktor Prasanna
297
9
0
21 Mar 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
334
91
0
15 Feb 2024
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on
  ReRAM
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
Bingbing Li
Geng Yuan
Zigeng Wang
Shaoyi Huang
Hongwu Peng
Rohit Das
Wujie Wen
Hang Liu
Caiwen Ding
213
8
0
22 Jan 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine TranslationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
349
7
0
15 Jan 2024
Can persistent homology whiten Transformer-based black-box models? A
  case study on BERT compression
Can persistent homology whiten Transformer-based black-box models? A case study on BERT compression
Luis Balderas
Miguel Lastra
José M. Benítez
135
3
0
17 Dec 2023
Large Multimodal Model Compression via Efficient Pruning and
  Distillation at AntGroup
Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Xinjian Zhao
Yao-Min Zhao
Jiajia Liu
Jingdong Chen
Chenyi Zhuang
Jinjie Gu
Ruocheng Guo
Xiangyu Zhao
180
11
0
10 Dec 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Huiyin Xue
Nikolaos Aletras
379
1
0
11 Oct 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
347
3
0
07 Aug 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer InferenceJournal of systems architecture (JSA), 2023
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
377
132
0
16 Jul 2023
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Taiqiang Wu
Cheng-An Hou
Shanshan Lao
Jiayi Li
Ngai Wong
Zhe Zhao
Yujiu Yang
337
10
0
16 May 2023
What Matters In The Structured Pruning of Generative Language Models?
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
227
38
0
07 Feb 2023
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Kyuhong Shim
Jungwook Choi
Wonyong Sung
ViT
233
3
0
29 Jan 2023
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at
  Industrial Scale
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial ScaleConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Raphael Tang
K. Kumar
Gefei Yang
Akshat Pandey
Yajie Mao
Vladislav Belyaev
Madhuri Emmadi
Craig Murray
Ferhan Ture
Jimmy J. Lin
210
5
0
21 Nov 2022
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer InferenceConference on Machine Learning and Systems (MLSys), 2022
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
394
543
0
09 Nov 2022
Bridging Fairness and Environmental Sustainability in Natural Language
  Processing
Bridging Fairness and Environmental Sustainability in Natural Language ProcessingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Marius Hessenthaler
Emma Strubell
Dirk Hovy
Anne Lauscher
259
9
0
08 Nov 2022
Aerial Manipulation Using a Novel Unmanned Aerial Vehicle Cyber-Physical
  System
Aerial Manipulation Using a Novel Unmanned Aerial Vehicle Cyber-Physical System
Caiwu Ding
Hongwu Peng
Lu Lu
Caiwen Ding
135
0
0
27 Oct 2022
An Automatic and Efficient BERT Pruning for Edge AI Systems
An Automatic and Efficient BERT Pruning for Edge AI SystemsIEEE International Symposium on Quality Electronic Design (ISQED), 2022
Shaoyi Huang
Ning Liu
Yueying Liang
Hongwu Peng
Hongjia Li
Dongkuan Xu
Mimi Xie
Caiwen Ding
280
24
0
21 Jun 2022
CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework
CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework
Xiaofeng Li
Bin Ren
Xipeng Shen
Yanzhi Wang
GNN
148
0
0
21 Jun 2022
Differentially Private Model Compression
Differentially Private Model CompressionNeural Information Processing Systems (NeurIPS), 2022
Fatemehsadat Mireshghallah
A. Backurs
Huseyin A. Inan
Lukas Wutschitz
Janardhan Kulkarni
SyDa
225
16
0
03 Jun 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
263
209
0
29 Mar 2022
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
SPViT: Enabling Faster Vision Transformers via Soft Token PruningEuropean Conference on Computer Vision (ECCV), 2021
Zhenglun Kong
Zhaoyang Han
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
Hao Tang
Yanzhi Wang
ViT
346
206
0
27 Dec 2021
From Dense to Sparse: Contrastive Pruning for Better Pre-trained
  Language Model Compression
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Yanjie Liang
Songfang Huang
Fei Huang
VLM
131
32
0
14 Dec 2021
Sparse is Enough in Scaling Transformers
Sparse is Enough in Scaling Transformers
Sebastian Jaszczur
Aakanksha Chowdhery
Afroz Mohiuddin
Lukasz Kaiser
Wojciech Gajewski
Henryk Michalewski
Jonni Kanerva
MoE
203
120
0
24 Nov 2021
Pruning Self-attentions into Convolutional Layers in Single Path
Pruning Self-attentions into Convolutional Layers in Single Path
Haoyu He
Jianfei Cai
Jing Liu
Zizheng Pan
Jing Zhang
Dacheng Tao
Bohan Zhuang
ViT
339
62
0
23 Nov 2021
Accelerating Framework of Transformer by Hardware Design and Model
  Compression Co-Optimization
Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Panjie Qi
E. Sha
Qingfeng Zhuge
Hongwu Peng
Shaoyi Huang
Zhenglun Kong
Yuhong Song
Bingbing Li
233
59
0
19 Oct 2021
Sparse Progressive Distillation: Resolving Overfitting under
  Pretrain-and-Finetune Paradigm
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm
Shaoyi Huang
Dongkuan Xu
Ian En-Hsu Yen
Yijue Wang
Sung-En Chang
...
Shiyang Chen
Mimi Xie
Sanguthevar Rajasekaran
Hang Liu
Caiwen Ding
CLLVLM
316
37
0
15 Oct 2021
Binary Complex Neural Network Acceleration on FPGA
Binary Complex Neural Network Acceleration on FPGAIEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP), 2021
Hongwu Peng
Shangli Zhou
Scott Weitze
Jiaxin Li
Sahidul Islam
...
Wei Zhang
M. Song
Mimi Xie
Hang Liu
Caiwen Ding
MQ
128
24
0
10 Aug 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
440
201
0
02 Jul 2021
Dancing along Battery: Enabling Transformer with Run-time
  Reconfigurability on Mobile Devices
Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile DevicesDesign Automation Conference (DAC), 2021
Yuhong Song
Weiwen Jiang
Bingbing Li
Panjie Qi
Qingfeng Zhuge
E. Sha
Sakyasingha Dasgupta
Yiyu Shi
Caiwen Ding
165
21
0
12 Feb 2021
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
570
213
0
27 Feb 2020
1
Page 1 of 1