Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.03803
Cited By
Rethinking the Value of Transformer Components
7 November 2020
Wenxuan Wang
Zhaopeng Tu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking the Value of Transformer Components"
26 / 26 papers shown
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
Younes Hourri
Mohammad Mozaffari
M. Dehnavi
259
0
0
24 Dec 2025
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
Xin Yuan
S. Li
Jiateng Wei
Chengrui Zhu
Yanming Wu
Qingpeng Li
Jiajun Lv
Xiaoke Lan
Jun Chen
Yong-Jin Liu
OffRL
468
0
0
24 Nov 2025
A Comprehensive Review of Reinforcement Learning for Autonomous Driving in the CARLA Simulator
Elahe Delavari
Feeza Khan Khanzada
Jaerock Kwon
252
4
0
10 Sep 2025
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
Di He
Ajay Jaiswal
Songjun Tu
Li Shen
Ganzhao Yuan
Shiwei Liu
L. Yin
436
3
0
17 Jun 2025
TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE
Tong Sun
Bowen Jiang
Hailong Lin
Borui Li
Yixiao Teng
Yi Gao
Wei Dong
FedML
266
7
0
28 May 2025
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
185
33
0
14 Oct 2024
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Suzhen Wang
Yifeng Ma
Yu Ding
Zhipeng Hu
Changjie Fan
Tangjie Lv
Zhidong Deng
Xin Yu
289
23
0
14 Sep 2024
Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Sunit Bhattacharya
Ondrej Bojar
230
17
0
24 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
International Conference on Machine Learning (ICML), 2023
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zinan Lin
Shiwei Liu
682
164
0
08 Oct 2023
An Empirical Study of CLIP for Text-based Person Search
AAAI Conference on Artificial Intelligence (AAAI), 2023
Min Cao
Yang Bai
Ziyin Zeng
Mang Ye
Min Zhang
VLM
461
102
0
19 Aug 2023
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
213
20
0
21 Apr 2023
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization
Computer Vision and Pattern Recognition (CVPR), 2023
Jianping Zhang
Yizhan Huang
Weibin Wu
Michael R. Lyu
AAML
ViT
322
83
0
28 Mar 2023
Improving the Transferability of Adversarial Samples by Path-Augmented Method
Computer Vision and Pattern Recognition (CVPR), 2023
Jianping Zhang
Shu Yang
Wenxuan Wang
Yichen Li
Weibin Wu
Xiaosen Wang
Yuxin Su
Michael R. Lyu
AAML
239
79
0
28 Mar 2023
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yifeng Ma
Suzhe Wang
Zhipeng Hu
Changjie Fan
Tangjie Lv
Yu-qiong Ding
Zhidong Deng
Xin Yu
403
132
0
03 Jan 2023
ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation
International Conference on Computational Linguistics (COLING), 2022
Cunxiao Du
Zhaopeng Tu
Longyue Wang
Jing Jiang
269
11
0
08 Oct 2022
Multimodal Learning with Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
663
947
0
13 Jun 2022
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Wenxuan Wang
Wenxiang Jiao
Shuo Wang
Zhaopeng Tu
Michael R. Lyu
UQLM
233
11
0
20 May 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
IEEE transactions on multimedia (IEEE TMM), 2022
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
244
5
0
24 Mar 2022
Training-free Transformer Architecture Search
Computer Vision and Pattern Recognition (CVPR), 2022
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
215
57
0
23 Mar 2022
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Natural Language Processing and Chinese Computing (NLPCC), 2022
Yunzhi Yao
Shaohan Huang
Li Dong
Furu Wei
Huajun Chen
Ningyu Zhang
KELM
MedIm
410
50
0
15 Jan 2022
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
606
176
0
05 Oct 2021
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
Michael R. Lyu
MQ
250
53
0
30 Sep 2021
Bag of Tricks for Optimizing Transformer Efficiency
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
169
7
0
09 Sep 2021
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
Yu Shi
203
1
0
02 Jul 2021
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning
International Conference on Learning Representations (ICLR), 2020
Xuebo Liu
Longyue Wang
Yang Li
Liang Ding
Lidia S. Chao
Zhaopeng Tu
AI4CE
264
38
0
29 Dec 2020
Context-Aware Cross-Attention for Non-Autoregressive Translation
International Conference on Computational Linguistics (COLING), 2020
Liang Ding
Longyue Wang
Di Wu
Dacheng Tao
Zhaopeng Tu
216
42
0
02 Nov 2020
1
Page 1 of 1