Differentiable Subset Pruning of Transformer Heads

10 August 2021

Papers citing "Differentiable Subset Pruning of Transformer Heads"

8 / 8 papers shown

Title
Controllable Context Sensitivity and the Knob Behind It Julian Minder Kevin Du Niklas Stoehr Giovanni Monea Chris Wendler Robert West Ryan Cotterell KELM 39 3 0 11 Nov 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models Amit Dhurandhar Tejaswini Pedapati Ronny Luss Soham Dan Aurélie C. Lozano Payel Das Georgios Kollias 22 3 0 28 Feb 2024
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning Chong Li Shaonan Wang Yunhao Zhang Jiajun Zhang Chengqing Zong 25 4 0 16 Oct 2023
Token Sparsification for Faster Medical Image Segmentation Lei Zhou Huidong Liu Joseph Bae Junjun He Dimitris Samaras Prateek Prasanna MedIm 13 3 0 11 Mar 2023
Probing via Prompting Jiaoda Li Ryan Cotterell Mrinmaya Sachan 24 13 0 04 Jul 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention Zuzana Jelčicová Marian Verhelst 24 5 0 20 Mar 2022
Pruning Self-attentions into Convolutional Layers in Single Path Haoyu He Jianfei Cai Jing Liu Zizheng Pan Jing Zhang Dacheng Tao Bohan Zhuang ViT 29 40 0 23 Nov 2021
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 247 9,109 0 06 Jun 2015