Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.05016
Cited By
Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning
9 March 2022
Guyue Huang
Haoran Li
Minghai Qin
Fei Sun
Yufei Din
Yuan Xie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning"
6 / 6 papers shown
Title
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Seungmin Yu
Xiaodie Yi
Hayun Lee
Dongkun Shin
24
1
0
30 Jul 2024
Realizing Unaligned Block-wise Pruning for DNN Acceleration on Mobile Devices
Hayun Lee
Dongkun Shin
MQ
28
0
0
29 Jul 2024
Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models
Jan Finkbeiner
Thomas Gmeinder
M. Pupilli
A. Titterton
Emre Neftci
19
3
0
07 Nov 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
S. Song
57
11
0
19 Sep 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1