Rethinking the Value of Transformer Components

7 November 2020

Wenxuan Wang

Zhaopeng Tu

ArXiv (abs)PDF HTML

Papers citing "Rethinking the Value of Transformer Components"

26 / 26 papers shown

PATCH: Learnable Tile-level Hybrid Sparsity for LLMs

Younes Hourri

Mohammad Mozaffari

M. Dehnavi

259

24 Dec 2025

FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

468

24 Nov 2025

A Comprehensive Review of Reinforcement Learning for Autonomous Driving in the CARLA Simulator

Elahe Delavari

Feeza Khan Khanzada

Jaerock Kwon

252

10 Sep 2025

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

436

17 Jun 2025

TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE

266

28 May 2025

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Haiquan Lu

Yefan Zhou

Shiwei Liu

Zhangyang Wang

Michael W. Mahoney

Yaoqing Yang

185

14 Oct 2024

StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking HeadsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

Suzhen Wang

Yifeng Ma

Yu Ding

Zhipeng Hu

Changjie Fan

Tangjie Lv

Zhidong Deng

Xin Yu

289

14 Sep 2024

Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward NetworksBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Sunit Bhattacharya

Ondrej Bojar

230

24 Oct 2023

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High SparsityInternational Conference on Machine Learning (ICML), 2023

...

682

164

08 Oct 2023

An Empirical Study of CLIP for Text-based Person SearchAAAI Conference on Artificial Intelligence (AAAI), 2023

Min Zhang

461

102

19 Aug 2023

Transformer-based models and hardware acceleration analysis in autonomous driving: A survey

213

21 Apr 2023

Transferable Adversarial Attacks on Vision Transformers with Token Gradient RegularizationComputer Vision and Pattern Recognition (CVPR), 2023

Jianping Zhang

Yizhan Huang

Weibin Wu

Michael R. Lyu

AAML ViT

322

28 Mar 2023

Improving the Transferability of Adversarial Samples by Path-Augmented MethodComputer Vision and Pattern Recognition (CVPR), 2023

Jianping Zhang

Michael R. Lyu

239

28 Mar 2023

StyleTalk: One-shot Talking Head Generation with Controllable Speaking StylesAAAI Conference on Artificial Intelligence (AAAI), 2023

Changjie Fan

403

132

03 Jan 2023

ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine TranslationInternational Conference on Computational Linguistics (COLING), 2022

269

08 Oct 2022

Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

663

947

13 Jun 2022

Understanding and Mitigating the Uncertainty in Zero-Shot TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Shuo Wang

Michael R. Lyu

233

20 May 2022

Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question AnsweringIEEE transactions on multimedia (IEEE TMM), 2022

244

24 Mar 2022

Training-free Transformer Architecture SearchComputer Vision and Pattern Recognition (CVPR), 2022

Yonghong Tian

Jie Chen

Rongrong Ji

ViT

215

23 Mar 2022

Kformer: Knowledge Injection in Transformer Feed-Forward LayersNatural Language Processing and Chinese Computing (NLPCC), 2022

Huajun Chen

Ningyu Zhang

KELM MedIm

410

15 Jan 2022

MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

Zhengyan Zhang

Yankai Lin

Zhiyuan Liu

Peng Li

Maosong Sun

Jie Zhou

MoE

606

176

05 Oct 2021

Towards Efficient Post-training Quantization of Pre-trained Language Models

Haoli Bai

Lu Hou

Lifeng Shang

Xin Jiang

Irwin King

Michael R. Lyu

250

30 Sep 2021

Bag of Tricks for Optimizing Transformer EfficiencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Ye Lin

Yanyang Li

Tong Xiao

Jingbo Zhu

169

09 Sep 2021

Transformer-F: A Transformer network with effective methods for learning universal sentence representation

Yu Shi

203

02 Jul 2021

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence LearningInternational Conference on Learning Representations (ICLR), 2020

Liang Ding

264

29 Dec 2020

Context-Aware Cross-Attention for Non-Autoregressive TranslationInternational Conference on Computational Linguistics (COLING), 2020

Liang Ding

216

02 Nov 2020