ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.01691
  4. Cited By
DeLTA: GPU Performance Model for Deep Learning Applications with
  In-depth Memory System Traffic Analysis

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

2 April 2019
Sangkug Lym
Donghyuk Lee
Mike O'Connor
Niladrish Chatterjee
M. Erez
ArXiv (abs)PDFHTML

Papers citing "DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis"

7 / 7 papers shown
Title
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for
  AI-GPUs
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs
Guyue Huang
Yang Bai
Liu Liu
Yuke Wang
Bei Yu
Yufei Ding
Yuan Xie
88
18
0
29 Oct 2022
Inference Latency Prediction at the Edge
Inference Latency Prediction at the Edge
Zhuojin Li
Marco Paolieri
L. Golubchik
56
3
0
06 Oct 2022
Building a Performance Model for Deep Learning Recommendation Model
  Training on GPUs
Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Zhongyi Lin
Louis Feng
E. K. Ardestani
Jaewon Lee
J. Lundell
Changkyu Kim
A. Kejariwal
John Douglas Owens
47
19
0
19 Jan 2022
Characterizing and Demystifying the Implicit Convolution Algorithm on
  Commercial Matrix-Multiplication Accelerators
Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators
Yangjie Zhou
Mengtian Yang
Cong Guo
Jingwen Leng
Yun Liang
Quan Chen
Minyi Guo
Yuhao Zhu
63
35
0
08 Oct 2021
Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike
  Hybrid Input Encoding
Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike Hybrid Input Encoding
Gourav Datta
Souvik Kundu
Peter A. Beerel
131
29
0
26 Jul 2021
FusionStitching: Boosting Memory Intensive Computations for Deep
  Learning Workloads
FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads
Zhen Zheng
Pengzhan Zhao
Guoping Long
Feiwen Zhu
Kai Zhu
Wenyi Zhao
Lansong Diao
Jun Yang
Wei Lin
70
31
0
23 Sep 2020
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN
  Model Training
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training
Sangkug Lym
M. Erez
28
26
0
27 Apr 2020
1