ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.03740
  4. Cited By
Mixed Precision Training

Mixed Precision Training

10 October 2017
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
David García
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
ArXivPDFHTML

Papers citing "Mixed Precision Training"

50 / 265 papers shown
Title
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Joya Chen
Kai Xu
Yuhui Wang
Yifei Cheng
Angela Yao
19
7
0
28 Feb 2022
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
  Preprocessing Pipelines
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Alexander Isenko
R. Mayer
Jeffrey Jedele
Hans-Arno Jacobsen
16
23
0
17 Feb 2022
pNLP-Mixer: an Efficient all-MLP Architecture for Language
pNLP-Mixer: an Efficient all-MLP Architecture for Language
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
26
29
0
09 Feb 2022
Backdoor Defense via Decoupling the Training Process
Backdoor Defense via Decoupling the Training Process
Kunzhe Huang
Yiming Li
Baoyuan Wu
Zhan Qin
Kui Ren
AAML
FedML
11
185
0
05 Feb 2022
Accelerating DNN Training with Structured Data Gradient Pruning
Accelerating DNN Training with Structured Data Gradient Pruning
Bradley McDanel
Helia Dinh
J. Magallanes
4
7
0
01 Feb 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
55
728
0
28 Jan 2022
Linguistically-driven Multi-task Pre-training for Low-resource Neural
  Machine Translation
Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation
Zhuoyuan Mao
Chenhui Chu
Sadao Kurohashi
9
6
0
20 Jan 2022
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Thomas Müller
Alex Evans
Christoph Schied
A. Keller
64
3,861
0
16 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional
  Vision-Language Generation
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with
  Text-Guided Diffusion Models
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
64
3,459
0
20 Dec 2021
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
50
188
0
20 Dec 2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by
  Using Error-Bounded Lossy Compression
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
S. Song
Dingwen Tao
18
23
0
18 Nov 2021
Scaling Law for Recommendation Models: Towards General-purpose User
  Representations
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Max Nihlén Ramström
Jisu Jeong
Jung-Woo Ha
S. Kim
ELM
26
38
0
15 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
28
612
0
09 Nov 2021
Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training
Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training
Minguk Kang
Woohyeon Shim
Minsu Cho
Jaesik Park
GAN
28
107
0
01 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
22
14
0
01 Nov 2021
MetaICL: Learning to Learn In Context
MetaICL: Learning to Learn In Context
Sewon Min
M. Lewis
Luke Zettlemoyer
Hannaneh Hajishirzi
LRM
34
466
0
29 Oct 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
  Edge
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
25
4
0
29 Oct 2021
Whole Brain Segmentation with Full Volume Neural Network
Whole Brain Segmentation with Full Volume Neural Network
Yeshu Li
Jianwei Cui
Yilun Sheng
Xiao Liang
Jingdong Wang
E. Chang
Yan Xu
27
11
0
29 Oct 2021
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Ashutosh Pandey
Buye Xu
Anurag Kumar
Jacob Donley
P. Calamia
DeLiang Wang
18
4
0
22 Oct 2021
TPARN: Triple-path Attentive Recurrent Network for Time-domain
  Multichannel Speech Enhancement
TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Ashutosh Pandey
Buye Xu
Anurag Kumar
Jacob Donley
P. Calamia
DeLiang Wang
KELM
17
39
0
20 Oct 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
28
74
0
18 Oct 2021
TorchEsegeta: Framework for Interpretability and Explainability of
  Image-based Deep Learning Models
TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models
S. Chatterjee
Arnab Das
Chirag Mandal
Budhaditya Mukhopadhyay
Manish Vipinraj
Aniruddh Shukla
R. Rao
Chompunuch Sarasaen
Oliver Speck
A. Nürnberger
MedIm
21
14
0
16 Oct 2021
Differentially Private Fine-tuning of Language Models
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
346
0
13 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual
  Representation Learning
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
16
36
0
11 Oct 2021
Optimized U-Net for Brain Tumor Segmentation
Optimized U-Net for Brain Tumor Segmentation
Michal Futrega
Alexandre Milesi
Michal Marcinkiewicz
Pablo Ribalta
SSeg
27
90
0
07 Oct 2021
8-bit Optimizers via Block-wise Quantization
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
17
267
0
06 Oct 2021
Multilingual Translation via Grafting Pre-trained Language Models
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun
Mingxuan Wang
Lei Li
AI4CE
183
22
0
11 Sep 2021
PPT: Pre-trained Prompt Tuning for Few-shot Learning
PPT: Pre-trained Prompt Tuning for Few-shot Learning
Yuxian Gu
Xu Han
Zhiyuan Liu
Minlie Huang
VLM
27
401
0
09 Sep 2021
A Protection Method of Trained CNN Model Using Feature Maps Transformed
  With Secret Key From Unauthorized Access
A Protection Method of Trained CNN Model Using Feature Maps Transformed With Secret Key From Unauthorized Access
Maungmaung Aprilpyone
Hitoshi Kiya
14
5
0
01 Sep 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based
  Memory Management
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
21
24
0
12 Aug 2021
PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence
  Pretraining
PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining
Machel Reid
Mikel Artetxe
VLM
42
26
0
04 Aug 2021
LICHEE: Improving Language Model Pre-training with Multi-grained
  Tokenization
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization
Weidong Guo
Mingjun Zhao
Lusheng Zhang
Di Niu
Jinwen Luo
Zhenhua Liu
Zhenyang Li
J. Tang
15
8
0
02 Aug 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
24
811
0
14 Jun 2021
Rethinking Training from Scratch for Object Detection
Rethinking Training from Scratch for Object Detection
Yang Li
Hong Zhang
Yu Zhang
VLM
OnRL
ObjD
14
5
0
06 Jun 2021
Quantization and Deployment of Deep Neural Networks on Microcontrollers
Quantization and Deployment of Deep Neural Networks on Microcontrollers
Pierre-Emmanuel Novac
G. B. Hacene
Alain Pegatoquet
Benoit Miramond
Vincent Gripon
MQ
20
116
0
27 May 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
63
7,399
0
11 May 2021
Clean Images are Hard to Reblur: Exploiting the Ill-Posed Inverse Task
  for Dynamic Scene Deblurring
Clean Images are Hard to Reblur: Exploiting the Ill-Posed Inverse Task for Dynamic Scene Deblurring
Seungjun Nah
Sanghyun Son
Jaerin Lee
Kyoung Mu Lee
59
17
0
26 Apr 2021
Differentiable Model Compression via Pseudo Quantization Noise
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
10
46
0
20 Apr 2021
See through Gradients: Image Batch Recovery via GradInversion
See through Gradients: Image Batch Recovery via GradInversion
Hongxu Yin
Arun Mallya
Arash Vahdat
J. Álvarez
Jan Kautz
Pavlo Molchanov
FedML
23
459
0
15 Apr 2021
Charged particle tracking via edge-classifying interaction networks
Charged particle tracking via edge-classifying interaction networks
G. Dezoort
S. Thais
Javier Mauricio Duarte
Vesal Razavimaleki
M. Atkinson
I. Ojalvo
Mark S. Neubauer
P. Elmer
19
46
0
30 Mar 2021
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small
  Object Detection
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
Chenhongyi Yang
Zehao Huang
Naiyan Wang
ObjD
16
224
0
16 Mar 2021
EventGraD: Event-Triggered Communication in Parallel Machine Learning
EventGraD: Event-Triggered Communication in Parallel Machine Learning
Soumyadip Ghosh
B. Aquino
V. Gupta
FedML
14
8
0
12 Mar 2021
Contrastive Semi-supervised Learning for ASR
Contrastive Semi-supervised Learning for ASR
Alex Xiao
Christian Fuegen
Abdel-rahman Mohamed
14
20
0
09 Mar 2021
Dynamic Efficient Adversarial Training Guided by Gradient Magnitude
Dynamic Efficient Adversarial Training Guided by Gradient Magnitude
Fu Lee Wang
Yanghao Zhang
Yanbin Zheng
Wenjie Ruan
23
1
0
04 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
98
27,569
0
26 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,774
0
24 Feb 2021
Enabling Binary Neural Network Training on the Edge
Enabling Binary Neural Network Training on the Edge
Erwei Wang
James J. Davis
Daniele Moro
Piotr Zielinski
Jia Jie Lim
C. Coelho
S. Chatterjee
P. Cheung
G. Constantinides
MQ
15
24
0
08 Feb 2021
Deep Residual Learning in Spiking Neural Networks
Deep Residual Learning in Spiking Neural Networks
Wei Fang
Zhaofei Yu
Yanqing Chen
Tiejun Huang
T. Masquelier
Yonghong Tian
121
475
0
08 Feb 2021
Previous
123456
Next