Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1410.0759
Cited By
cuDNN: Efficient Primitives for Deep Learning
3 October 2014
Sharan Chetlur
Cliff Woolley
Philippe Vandermersch
Jonathan M. Cohen
J. Tran
Bryan Catanzaro
Evan Shelhamer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"cuDNN: Efficient Primitives for Deep Learning"
50 / 231 papers shown
Title
Advancing Weight and Channel Sparsification with Enhanced Saliency
Xinglong Sun
Maying Shen
Hongxu Yin
Lei Mao
Pavlo Molchanov
Jose M. Alvarez
54
1
0
05 Feb 2025
TipSegNet: Fingertip Segmentation in Contactless Fingerprint Imaging
L. Ruzicka
Bernhard Kohn
Clemens Heitzinger
52
0
0
10 Jan 2025
UnifiedNN: Efficient Neural Network Training on the Cloud
Xingyu Lou
Arthi Padmanabhan
Spyridon Mastorakis
FedML
46
0
0
02 Aug 2024
Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies
Johannes Pekkilä
Oskar Lappi
Fredrik Robertsén
Maarit J. Korpi-Lagg
20
0
0
13 Jun 2024
Deep Learning for Low-Latency, Quantum-Ready RF Sensing
P. Gokhale
Caitlin Carnahan
William Clark
Teague Tomesh
Frederic T. Chong
31
1
0
27 Apr 2024
Addressing the speed-accuracy simulation trade-off for adaptive spiking neurons
Luke Taylor
Andrew J. King
N. Harper
30
6
0
19 Nov 2023
On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series
Rita Kuznetsova
Alizée Pace
Manuel Burger
Hugo Yèche
Gunnar Rätsch
AI4TS
39
5
0
15 Nov 2023
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Ruihang Lai
Junru Shao
Siyuan Feng
Steven Lyubomirsky
Bohan Hou
...
Sunghyun Park
Prakalp Srivastava
Jared Roesch
T. Mowry
Tianqi Chen
47
9
0
01 Nov 2023
Reduce Computational Complexity for Convolutional Layers by Skipping Zeros
Zhiyi Zhang
Pengfei Zhang
Zhuopin Xu
Qi Wang
26
1
0
28 Jun 2023
Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning
Chenghao Feng
Jiaqi Gu
Hanqing Zhu
R. Tang
Shupeng Ning
M. Hlaing
J. Midkiff
Sourabh Jain
David Z. Pan
Ray T. Chen
28
8
0
31 May 2023
SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving
Minjae Lee
Seongmin Park
Hyung-Se Kim
Minyong Yoon
Jangwhan Lee
Junwon Choi
Nam Sung Kim
Mingu Kang
Jungwook Choi
3DPC
26
4
0
12 May 2023
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Yueming Hao
Xu Zhao
Bin Bao
David Berard
William Constable
Adnan Aziz
Xu Liu
38
5
0
27 Apr 2023
Training Neural Networks for Execution on Approximate Hardware
Tianmu Li
Shurui Li
Puneet Gupta
35
1
0
08 Apr 2023
Tensor Slicing and Optimization for Multicore NPUs
R. Sousa
M. Pereira
Yongin Kwon
Taeho Kim
Namsoon Jung
Chang Soo Kim
Michael Frank
Guido Araujo
24
5
0
06 Apr 2023
Locality-constrained autoregressive cum conditional normalizing flow for lattice field theory simulations
R. DineshP.
AI4CE
22
0
0
04 Apr 2023
NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools
Peiran Yao
Matej Kosmajac
Abeer Waheed
Kostyantyn Guzhva
Natalie Hervieux
Denilson Barbosa
14
2
0
02 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
Fixflow: A Framework to Evaluate Fixed-point Arithmetic in Light-Weight CNN Inference
Farhad Taheri
Siavash Bayat Sarmadi
H. Mosanaei-Boorani
Reza Taheri
MQ
23
1
0
19 Feb 2023
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Geonhwa Jeong
S. Damani
Abhimanyu Bambhaniya
Eric Qin
C. Hughes
S. Subramoney
Hyesoon Kim
T. Krishna
MoE
46
24
0
17 Feb 2023
Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems
Jieyang Chen
Xin Liang
Kai Zhao
H. Sabzi
L. Bhuyan
Zizhong Chen
17
4
0
09 Jan 2023
Efficient On-device Training via Gradient Filtering
Yuedong Yang
Guihong Li
R. Marculescu
39
18
0
01 Jan 2023
An overview of open source Deep Learning-based libraries for Neuroscience
Louis Fabrice Tshimanga
Manfredo Atzori
Federico Del Pup
M. Corbetta
OOD
43
4
0
19 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
29
2
0
06 Dec 2022
Fast convolution kernels on pascal GPU with high memory efficiency
Qiong Chang
Masaki Onishi
T. Maruyama
3DV
17
6
0
01 Dec 2022
ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining
C. Peltekis
D. Filippas
G. Dimitrakopoulos
C. Nicopoulos
D. Pnevmatikatos
21
5
0
22 Nov 2022
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks
Zining Zhang
Bingsheng He
Zhenjie Zhang
14
5
0
21 Nov 2022
ParticleGrid: Enabling Deep Learning using 3D Representation of Materials
Shehtab Zaman
E. Ferguson
Cécile Pereira
D. Akhiyarov
Mauricio Araya-Polo
Kenneth Chiu
DiffM
AI4CE
29
2
0
15 Nov 2022
Pruning Very Deep Neural Network Channels for Efficient Inference
Yihui He
35
1
0
14 Nov 2022
BiViT: Extremely Compressed Binary Vision Transformer
Yefei He
Zhenyu Lou
Luoming Zhang
Jing Liu
Weijia Wu
Hong Zhou
Bohan Zhuang
ViT
MQ
20
28
0
14 Nov 2022
NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators
Aditya Manglik
Minesh Patel
Haiyu Mao
Behzad Salami
Jisung Park
Lois Orosa
O. Mutlu
20
1
0
10 Nov 2022
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition
Lizhi Xiang
Miao Yin
Chengming Zhang
Aravind Sukumaran-Rajam
P. Sadayappan
Bo Yuan
Dingwen Tao
3DV
27
8
0
07 Nov 2022
Teal: Learning-Accelerated Optimization of WAN Traffic Engineering
Zhiying Xu
Francis Y. Yan
Rachee Singh
Justin T. Chiu
Alexander M. Rush
Minlan Yu
23
44
0
25 Oct 2022
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Zhiying Xu
Jiafan Xu
H. Peng
Wei Wang
Xiaoliang Wang
...
Haipeng Dai
Yixu Xu
Hao Cheng
Kun Wang
Guihai Chen
35
0
0
22 Oct 2022
ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference
Jing Gong
Hassaan Saadat
Hasindu Gamaarachchi
Haris Javaid
X. Hu
S. Parameswaran
33
12
0
09 Sep 2022
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey
Dalin Zhang
Kaixuan Chen
Yan Zhao
B. Yang
Li-Ping Yao
Christian S. Jensen
48
3
0
22 Aug 2022
OLLIE: Derivation-based Tensor Program Optimizer
Liyan Zheng
Haojie Wang
Jidong Zhai
Muyan Hu
Zixuan Ma
Tuowei Wang
Shizhi Tang
Lei Xie
Kezhao Huang
Zhihao Jia
46
3
0
02 Aug 2022
Towards Efficient Communications in Federated Learning: A Contemporary Survey
Zihao Zhao
Yuzhu Mao
Yang Liu
Linqi Song
Ouyang Ye
Xinlei Chen
Wenbo Ding
FedML
59
60
0
02 Aug 2022
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System
Juan Gómez Luna
Yu-Yin Guo
Sylvan Brocard
Julien Legriel
Remy Cimadomo
Geraldo F. Oliveira
Gagandeep Singh
O. Mutlu
VLM
33
15
0
16 Jul 2022
RT-DNAS: Real-time Constrained Differentiable Neural Architecture Search for 3D Cardiac Cine MRI Segmentation
Qing Lu
Xiaowei Xu
Shunjie Dong
Callie Hao
Lei Yang
Cheng Zhuo
Yiyu Shi
MedIm
24
4
0
08 Jun 2022
HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness
D. Rieber
Moritz Reiber
Oliver Bringmann
Holger Fröning
24
4
0
31 May 2022
Tensor Program Optimization with Probabilistic Programs
Junru Shao
Xiyou Zhou
Siyuan Feng
Bohan Hou
Ruihang Lai
Hongyi Jin
Wuwei Lin
Masahiro Masuda
Cody Hao Yu
Tianqi Chen
37
29
0
26 May 2022
Structured Pruning is All You Need for Pruning CNNs at Initialization
Yaohui Cai
Weizhe Hua
Hongzheng Chen
G. E. Suh
Christopher De Sa
Zhiru Zhang
CVBM
47
14
0
04 Mar 2022
A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning
Gerald Schwiebert
C. Weber
Leyuan Qu
Henrique Siqueira
S. Wermter
32
12
0
27 Feb 2022
Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation
Jiawei Liu
Yuxiang Wei
Sen Yang
Yinlin Deng
Lingming Zhang
33
41
0
21 Feb 2022
EF-Train: Enable Efficient On-device CNN Training on FPGA Through Data Reshaping for Online Adaptation or Personalization
Yue Tang
Xinyi Zhang
Peipei Zhou
Jingtong Hu
21
17
0
18 Feb 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
46
4
0
04 Feb 2022
Towards Training Reproducible Deep Learning Models
Boyuan Chen
Mingzhi Wen
Yong Shi
Dayi Lin
Gopi Krishnan Rajbahadur
Zhen Ming
Z. Jiang
SyDa
17
37
0
04 Feb 2022
Accelerating DNN Training with Structured Data Gradient Pruning
Bradley McDanel
Helia Dinh
J. Magallanes
17
7
0
01 Feb 2022
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Perry Gibson
José Cano
29
12
0
14 Jan 2022
Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction
Marcin Pietroñ
Dominik Zurek
30
13
0
28 Dec 2021
1
2
3
4
5
Next