Title
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism Venmugil Elango 106 0 0 20 Mar 2025
Importance Sampling via Score-based Generative Models Heasung Kim Taekyun Lee Hyeji Kim Gustavo de Veciana MedIm DiffM 202 2 0 07 Feb 2025
How to set AdamW's weight decay as you scale model and dataset size Xi Wang Laurence Aitchison 147 11 0 22 May 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices Si Ung Noh Junguk Hong Chaemin Lim Seong-Yeol Park Jeehyun Kim Hanjun Kim Youngsok Kim Jinho Lee 77 8 0 13 Apr 2024
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators Renbo Tu Colin White Jean Kossaifi Boris Bonev Nikola B. Kovachki Gennady Pekhimenko Kamyar Azizzadenesheli Anima Anandkumar 64 11 0 27 Jul 2023
FFCV: Accelerating Training by Removing Data Bottlenecks Guillaume Leclerc Andrew Ilyas Logan Engstrom Sung Min Park Hadi Salman Aleksander Madry 61 70 0 21 Jun 2023
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining Lin Zhang Shaoshuai Shi Xiaowen Chu Wei Wang Yue Liu Chengjian Liu 61 11 0 24 Feb 2023
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs A. M. Ribeiro-dos-Santos João Dinis Ferreira O. Mutlu G. Falcão MQ 84 2 0 15 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling Xin Ma Chang-Shu Liu Chunyu Xie Long Ye Yafeng Deng Xiang Ji 129 9 0 31 Dec 2022
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training Mingliang Xu Gongrui Nan Yuxin Zhang Chia-Wen Lin Rongrong Ji MQ 48 3 0 12 Nov 2022
Large-batch Optimization for Dense Visual Predictions Zeyue Xue Jianming Liang Guanglu Song Zhuofan Zong Liang Chen Yu Liu Ping Luo VLM 96 9 0 20 Oct 2022
Towards Efficient Communications in Federated Learning: A Contemporary Survey Zihao Zhao Yuzhu Mao Yang Liu Linqi Song Ouyang Ye Xinlei Chen Wenbo Ding FedML 95 63 0 02 Aug 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning Lin Zhang Shaoshuai Shi Wei Wang Yue Liu 65 10 0 30 Jun 2022
One Hyper-Initializer for All Network Architectures in Medical Image Analysis Fangxin Shang Yehui Yang Dalu Yang Junde Wu Xiaorong Wang Yanwu Xu AI4CE 68 2 0 08 Jun 2022
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks Bum Jun Kim Hyeyeon Choi Hyeonah Jang Dong Gu Lee Wonseok Jeong Sang Woo Kim 45 4 0 15 May 2022
Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression Feijie Wu Shiqi He Song Guo Zhihao Qu Yining Qi W. Zhuang Jie Zhang 59 9 0 14 Apr 2022
Auto-scaling Vision Transformers without Training Wuyang Chen Wei-Ping Huang Xianzhi Du Xiaodan Song Zhangyang Wang Denny Zhou ViT 66 25 0 24 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs Weiyang Wang Moein Khazraee Zhizhen Zhong M. Ghobadi Zhihao Jia Dheevatsa Mudigere Ying Zhang A. Kewitsch 118 92 0 01 Feb 2022
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 83 15 0 01 Nov 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge Abdelrahman I. Hosny Marina Neseem Sherief Reda MQ 89 4 0 29 Oct 2021
Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning Ningning Xie Tamara Norman Dominik Grewe Dimitrios Vytiniotis 71 17 0 20 Oct 2021
EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks Shengwei Li Zhiquan Lai Dongsheng Li Yiming Zhang Xiangyu Ye Yabo Duan FedML 54 3 0 18 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models Saeed Rashidi William Won Sudarshan Srinivasan Srinivas Sridharan T. Krishna GNN 83 34 0 09 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 170 76 0 29 Sep 2021
Complexity-aware Adaptive Training and Inference for Edge-Cloud Distributed AI Systems Yinghan Long I. Chakraborty G. Srinivasan Kaushik Roy 48 15 0 14 Sep 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks Shaoshuai Shi Lin Zhang Yue Liu 123 9 0 14 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training Chen Dun Cameron R. Wolfe C. Jermaine Anastasios Kyrillidis 87 21 0 02 Jul 2021
Dive into Deep Learning Aston Zhang Zachary Chase Lipton Mu Li Alexander J. Smola VLM 87 570 0 21 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training Yong Liu Xiangning Chen Minhao Cheng Cho-Jui Hsieh Yang You ODL 77 13 0 01 Jun 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently Boxiang Wang Qifan Xu Zhengda Bian Yang You VLM GNN 23 34 0 30 May 2021
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud Detection S. Afanasiev A. Smirnova D. Kotereva 53 2 0 17 May 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu Chen Jiamin Ni Songtao Lu Xiaodong Cui Pin-Yu Chen ... Naigang Wang Swagath Venkataramani Vijayalakshmi Srinivasan Wei Zhang K. Gopalakrishnan 79 67 0 21 Apr 2021
On-device Federated Learning with Flower Akhil Mathur Daniel J. Beutel Pedro Porto Buarque de Gusmão Javier Fernandez-Marques Taner Topal Xinchi Qiu Titouan Parcollet Yan Gao Nicholas D. Lane FedML 94 38 0 07 Apr 2021
Large Batch Simulation for Deep Reinforcement Learning Brennan Shacklett Erik Wijmans Aleksei Petrenko Manolis Savva Dhruv Batra V. Koltun Kayvon Fatahalian 3DV OffRL AI4CE 88 26 0 12 Mar 2021
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training Krishnateja Killamsetty D. Sivasubramanian Ganesh Ramakrishnan A. De Rishabh K. Iyer OOD 157 207 0 27 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent Heesu Kim Hanmin Park Taehyun Kim Kwanheum Cho Eojin Lee Soojung Ryu Hyuk-Jae Lee Kiyoung Choi Jinho Lee 66 36 0 15 Feb 2021
Large-Scale Training System for 100-Million Classification at Alibaba Liuyihan Song Pan Pan Kang Zhao Hao Yang Yiming Chen Yingya Zhang Yinghui Xu Rong Jin 84 24 0 09 Feb 2021
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators Hamzah Abdel-Aziz Ali Shafiee J. Shin A. Pedram Joseph Hassoun MQ 67 11 0 27 Jan 2021
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability Sangho Yeo Minho Bae Minjoong Jeong Oh-Kyoung Kwon Sangyoon Oh 50 3 0 30 Dec 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training Y. Fu Haoran You Yang Zhao Yue Wang Chaojian Li K. Gopalakrishnan Zhangyang Wang Yingyan Lin MQ 81 32 0 24 Dec 2020
Data optimization for large batch distributed training of deep neural networks Shubhankar Gahlot Junqi Yin Mallikarjun Shankar 23 1 0 16 Dec 2020
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour Arissa Wongpanich Hieu H. Pham J. Demmel Mingxing Tan Quoc V. Le Yang You Sameer Kumar 65 8 0 30 Oct 2020
A Closer Look at Codistillation for Distributed Training Shagun Sodhani Olivier Delalleau Mahmoud Assran Koustuv Sinha Nicolas Ballas Michael G. Rabbat 123 8 0 06 Oct 2020
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware Andrew Or Haoyu Zhang M. Freedman 73 10 0 20 Sep 2020
Communication-efficient Decentralized Machine Learning over Heterogeneous Networks Pan Zhou Qian Lin Dumitrel Loghin Beng Chin Ooi Yuncheng Wu Hongfang Yu 63 37 0 12 Sep 2020
Flower: A Friendly Federated Learning Research Framework Daniel J. Beutel Taner Topal Akhil Mathur Xinchi Qiu Javier Fernandez-Marques ... Lorenzo Sani Kwing Hei Li Titouan Parcollet Pedro Porto Buarque de Gusmão Nicholas D. Lane FedML 142 822 0 28 Jul 2020
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning Yawen Wu Zhepeng Wang Yiyu Shi Jiaxi Hu 74 46 0 07 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models Shiqing Fan Yi Rong Chen Meng Zongyan Cao Siyu Wang ... Jun Yang Lixue Xia Lansong Diao Xiaoyong Liu Wei Lin 96 240 0 02 Jul 2020
The Limit of the Batch Size Yang You Yuhui Wang Huan Zhang Zhao-jie Zhang J. Demmel Cho-Jui Hsieh 121 15 0 15 Jun 2020
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging Subhadeep Bhattacharya Weikuan Yu Fahim Chowdhury FedML 19 2 0 12 Jun 2020