Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

9 March 2022

Papers citing "Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning"

6 / 6 papers shown

Title
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs Seungmin Yu Xiaodie Yi Hayun Lee Dongkun Shin 24 1 0 30 Jul 2024
Realizing Unaligned Block-wise Pruning for DNN Acceleration on Mobile Devices Hayun Lee Dongkun Shin MQ 28 0 0 29 Jul 2024
Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models Jan Finkbeiner Thomas Gmeinder M. Pupilli A. Titterton Emre Neftci 19 3 0 07 Nov 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity Haojun Xia Zhen Zheng Yuchao Li Donglin Zhuang Zhongzhu Zhou Xiafei Qiu Yong Li Wei Lin S. Song 57 11 0 19 Sep 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,817 0 17 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016