Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

16 April 2015

Stefan Hadjis

Papers citing "Caffe con Troll: Shallow Ideas to Speed Up Deep Learning"

20 / 20 papers shown

Title
SAFFIRA: a Framework for Assessing the Reliability of Systolic-Array-Based DNN Accelerators Mahdi Taheri Masoud Daneshtalab J. Raik M. Jenihhin Salvatore Pappalardo Paul Jiménez Bastien Deveautour A. Bosio 21 7 0 05 Mar 2024
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs Alexandros Kouris Stylianos I. Venieris Stefanos Laskaridis Nicholas D. Lane 42 8 0 27 Sep 2022
High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks Hossam Amer Ahmed H. Salamah A. Sajedi En-Hui Yang 11 6 0 16 Apr 2021
Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset Ilan Price Jared Tanner 29 15 0 12 Feb 2021
LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference in Image Recognition Ramyad Hadidi Bahar Asgari Jiashen Cao Younmin Bae Da Eun Shim Hyojong Kim Sung-Kyu Lim Michael S. Ryoo Hyesoon Kim 13 1 0 13 Mar 2020
Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications Chinthaka Gamanayake Lahiru Jayasinghe Benny Kai Kiat Ng Chau Yuen VLM 23 45 0 05 Mar 2020
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption P. Rosciszewski 14 2 0 20 Sep 2018
Deep Learning Approximation: Zero-Shot Neural Network Speedup Michele Pratusevich 22 0 0 15 Jun 2018
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs Xuhao Chen 18 25 0 28 Feb 2018
Cuttlefish: A Lightweight Primitive for Adaptive Query Processing Tomer Kaftan Magdalena Balazinska Alvin Cheung J. Gehrke 21 26 0 26 Feb 2018
Stochastic Gradient Descent on Highly-Parallel Architectures Yujing Ma Florin Rusu Martin Torres 27 6 0 24 Feb 2018
Distributed Training Large-Scale Deep Architectures Shang-Xuan Zou Chun-Yen Chen Jui-Lin Wu Chun-Nan Chou Chia-Chin Tsao Kuan-Chieh Tung Ting-Wei Lin Cheng-Lung Sung Edward Y. Chang 26 22 0 10 Aug 2017
MEC: Memory-efficient Convolution for Deep Neural Network Minsik Cho D. Brand 16 86 0 21 Jun 2017
Using Convolutional Neural Networks in Robots with Limited Computational Resources: Detecting NAO Robots while Playing Soccer Nicolás Cruz Kenzo Lobos-Tsunekawa Javier Ruiz-del-Solar 27 35 0 20 Jun 2017
On-the-fly Operation Batching in Dynamic Computation Graphs Graham Neubig Yoav Goldberg Chris Dyer 34 60 0 22 May 2017
Faster CNNs with Direct Sparse Convolutions and Guided Pruning Jongsoo Park Sheng Li W. Wen P. T. P. Tang Hai Helen Li Yiran Chen Pradeep Dubey 33 182 0 04 Aug 2016
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs Stefan Hadjis Ce Zhang Ioannis Mitliagkas Dan Iter Christopher Ré 20 65 0 14 Jun 2016
A Systematic Approach to Blocking Convolutional Neural Networks Xuan S. Yang Jing Pu Blaine Rister Nikhil Bhagdikar Stephen Richardson Shahar Kvatinsky Jonathan Ragan-Kelley A. Pedram M. Horowitz 16 54 0 14 Jun 2016
SparkNet: Training Deep Networks in Spark Philipp Moritz Robert Nishihara Ion Stoica Michael I. Jordan 36 169 0 19 Nov 2015
Automatic differentiation in machine learning: a survey A. G. Baydin Barak A. Pearlmutter Alexey Radul J. Siskind PINN AI4CE ODL 75 2,751 0 20 Feb 2015