ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.03803
  4. Cited By
Rethinking the Value of Transformer Components

Rethinking the Value of Transformer Components

7 November 2020
Wenxuan Wang
Zhaopeng Tu
ArXiv (abs)PDFHTML

Papers citing "Rethinking the Value of Transformer Components"

26 / 26 papers shown
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
237
0
0
24 Dec 2025
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
Xin Yuan
S. Li
Jiateng Wei
Chengrui Zhu
Yanming Wu
Qingpeng Li
Jiajun Lv
Xiaoke Lan
Jun Chen
Yong-Jin Liu
OffRL
432
0
0
24 Nov 2025
A Comprehensive Review of Reinforcement Learning for Autonomous Driving in the CARLA Simulator
A Comprehensive Review of Reinforcement Learning for Autonomous Driving in the CARLA Simulator
Elahe Delavari
Feeza Khan Khanzada
Jaerock Kwon
190
4
0
10 Sep 2025
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
Di He
Ajay Jaiswal
Songjun Tu
Li Shen
Ganzhao Yuan
Shiwei Liu
L. Yin
411
1
0
17 Jun 2025
TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE
TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE
Tong Sun
Bowen Jiang
Hailong Lin
Borui Li
Yixiao Teng
Yi Gao
Wei Dong
FedML
243
6
0
28 May 2025
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved
  Layer-wise Pruning of Large Language Models
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
169
27
0
14 Oct 2024
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of
  Talking Heads
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking HeadsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Suzhen Wang
Yifeng Ma
Yu Ding
Zhipeng Hu
Changjie Fan
Tangjie Lv
Zhidong Deng
Xin Yu
275
20
0
14 Sep 2024
Unveiling Multilinguality in Transformer Models: Exploring Language
  Specificity in Feed-Forward Networks
Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward NetworksBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Sunit Bhattacharya
Ondrej Bojar
198
17
0
24 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High SparsityInternational Conference on Machine Learning (ICML), 2023
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zinan Lin
Shiwei Liu
619
156
0
08 Oct 2023
An Empirical Study of CLIP for Text-based Person Search
An Empirical Study of CLIP for Text-based Person SearchAAAI Conference on Artificial Intelligence (AAAI), 2023
Min Cao
Yang Bai
Ziyin Zeng
Mang Ye
Min Zhang
VLM
423
98
0
19 Aug 2023
Transformer-based models and hardware acceleration analysis in
  autonomous driving: A survey
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
191
20
0
21 Apr 2023
Transferable Adversarial Attacks on Vision Transformers with Token
  Gradient Regularization
Transferable Adversarial Attacks on Vision Transformers with Token Gradient RegularizationComputer Vision and Pattern Recognition (CVPR), 2023
Jianping Zhang
Yizhan Huang
Weibin Wu
Michael R. Lyu
AAMLViT
283
81
0
28 Mar 2023
Improving the Transferability of Adversarial Samples by Path-Augmented
  Method
Improving the Transferability of Adversarial Samples by Path-Augmented MethodComputer Vision and Pattern Recognition (CVPR), 2023
Jianping Zhang
Shu Yang
Wenxuan Wang
Yichen Li
Weibin Wu
Xiaosen Wang
Yuxin Su
Michael R. Lyu
AAML
219
75
0
28 Mar 2023
StyleTalk: One-shot Talking Head Generation with Controllable Speaking
  Styles
StyleTalk: One-shot Talking Head Generation with Controllable Speaking StylesAAAI Conference on Artificial Intelligence (AAAI), 2023
Yifeng Ma
Suzhe Wang
Zhipeng Hu
Changjie Fan
Tangjie Lv
Yu-qiong Ding
Zhidong Deng
Xin Yu
359
123
0
03 Jan 2023
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for
  Non-Autoregressive Machine Translation
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine TranslationInternational Conference on Computational Linguistics (COLING), 2022
Cunxiao Du
Zhaopeng Tu
Longyue Wang
Jing Jiang
251
11
0
08 Oct 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
588
903
0
13 Jun 2022
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Understanding and Mitigating the Uncertainty in Zero-Shot TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Wenxuan Wang
Wenxiang Jiao
Shuo Wang
Zhaopeng Tu
Michael R. Lyu
UQLM
227
11
0
20 May 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual
  Question Answering
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question AnsweringIEEE transactions on multimedia (IEEE TMM), 2022
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
205
5
0
24 Mar 2022
Training-free Transformer Architecture Search
Training-free Transformer Architecture SearchComputer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
197
57
0
23 Mar 2022
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Kformer: Knowledge Injection in Transformer Feed-Forward LayersNatural Language Processing and Chinese Computing (NLPCC), 2022
Yunzhi Yao
Shaohan Huang
Li Dong
Furu Wei
Huajun Chen
Ningyu Zhang
KELMMedIm
363
48
0
15 Jan 2022
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
519
169
0
05 Oct 2021
Towards Efficient Post-training Quantization of Pre-trained Language
  Models
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
Michael R. Lyu
MQ
241
52
0
30 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Bag of Tricks for Optimizing Transformer EfficiencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
136
7
0
09 Sep 2021
Transformer-F: A Transformer network with effective methods for learning
  universal sentence representation
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
Yu Shi
167
1
0
02 Jul 2021
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence
  Learning
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence LearningInternational Conference on Learning Representations (ICLR), 2020
Xuebo Liu
Longyue Wang
Yang Li
Liang Ding
Lidia S. Chao
Zhaopeng Tu
AI4CE
238
37
0
29 Dec 2020
Context-Aware Cross-Attention for Non-Autoregressive Translation
Context-Aware Cross-Attention for Non-Autoregressive TranslationInternational Conference on Computational Linguistics (COLING), 2020
Liang Ding
Longyue Wang
Di Wu
Dacheng Tao
Zhaopeng Tu
183
41
0
02 Nov 2020
1
Page 1 of 1