ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.03740
  4. Cited By
Mixed Precision Training

Mixed Precision Training

10 October 2017
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
David García
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
ArXivPDFHTML

Papers citing "Mixed Precision Training"

50 / 306 papers shown
Title
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
45
1
0
04 Mar 2024
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Sunghyeon Woo
Baeseong Park
Byeongwook Kim
Minjung Jo
S. Kwon
Dongsuk Jeon
Dongsoo Lee
57
2
0
27 Feb 2024
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel
M. Armando
Salma Galaaoui
Romain Brégier
Philippe Weinzaepfel
Grégory Rogez
Thomas Lucas
3DH
37
18
0
22 Feb 2024
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao
Yanshuai Cao
Lili Mou
11
39
0
05 Feb 2024
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
19
94
0
02 Feb 2024
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on
  Agriculture
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
M. A. D. L. Balaguer
Vinamra Benara
Renato Luiz de Freitas Cunha
Roberto de M. Estevao Filho
Todd Hendry
...
Morris Sharp
B. Silva
Swati Sharma
Vijay Aski
Ranveer Chandra
FaML
30
81
0
16 Jan 2024
Knowledge Translation: A New Pathway for Model Compression
Knowledge Translation: A New Pathway for Model Compression
Wujie Sun
Defang Chen
Jiawei Chen
Yan Feng
Chun-Yen Chen
Can Wang
25
0
0
11 Jan 2024
Enhancing Contrastive Learning with Efficient Combinatorial Positive
  Pairing
Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
Jaeill Kim
Duhun Hwang
Eunjung Lee
Jangwon Suh
Jimyeong Kim
Wonjong Rhee
28
0
0
11 Jan 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
23
10
0
27 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
26
1
0
18 Dec 2023
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
W. Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric P. Xing
40
70
0
11 Dec 2023
Structured Inverse-Free Natural Gradient: Memory-Efficient &
  Numerically-Stable KFAC
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
Wu Lin
Felix Dangel
Runa Eschenhagen
Kirill Neklyudov
Agustinus Kristiadi
Richard E. Turner
Alireza Makhzani
22
3
0
09 Dec 2023
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
Xiaoyun Xu
Shujian Yu
Jingzheng Wu
S. Picek
AAML
35
0
0
08 Dec 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Bernard Ghanem
33
25
0
28 Nov 2023
LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource
  Sentiment Analysis of Bangla Language
LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource Sentiment Analysis of Bangla Language
Aunabil Chakma
Masum Hasan
37
3
0
21 Nov 2023
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
  Reconstruction Model
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Jiahao Li
Hao Tan
Kai Zhang
Zexiang Xu
Fujun Luan
Yinghao Xu
Yicong Hong
Kalyan Sunkavalli
Greg Shakhnarovich
Sai Bi
45
254
0
10 Nov 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
  Regime and Better Alignment to Human Preferences
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Yuanhe Tian
Ruyi Gan
Yan Song
Jiaxing Zhang
Yongdong Zhang
AI4MH
AI4CE
LM&MA
27
30
0
10 Nov 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
27
7
0
11 Oct 2023
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Dustin Wright
Christian Igel
Gabrielle Samuel
Raghavendra Selvan
29
15
0
05 Sep 2023
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
Wenyu Zhu
Hao Wang
Yuchen Zhou
Jiaming Wang
Zihan Sha
Zeyu Gao
Chao Zhang
32
10
0
24 Aug 2023
Towards General Text Embeddings with Multi-stage Contrastive Learning
Towards General Text Embeddings with Multi-stage Contrastive Learning
Zehan Li
Xin Zhang
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
56
342
0
07 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
26
1
0
31 Jul 2023
U-CE: Uncertainty-aware Cross-Entropy for Semantic Segmentation
U-CE: Uncertainty-aware Cross-Entropy for Semantic Segmentation
S. Landgraf
Markus Hillemann
Kira Wursthorn
Markus Ulrich
SSeg
UQCV
26
6
0
19 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
24
3
0
16 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely
  Limited Labels
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
33
5
0
05 Jul 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
21
2
0
17 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited
  Resources
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
45
126
0
16 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
25
1
0
07 Jun 2023
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification
  Tasks
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Jean-Michel Attendu
Jean-Philippe Corbeil
23
15
0
05 Jun 2023
A Transformer-based representation-learning model with unified
  processing of multimodal input for clinical diagnostics
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics
Hong-Yu Zhou
Yizhou Yu
Chengdi Wang
Shu Zhen Zhang
Yuanxu Gao
Jia-Yu Pan
Jun Shao
Guangming Lu
Kang Zhang
Weimin Li
MedIm
19
150
0
01 Jun 2023
Thought Cloning: Learning to Think while Acting by Imitating Human
  Thinking
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu
Jeff Clune
LM&Ro
OffRL
LRM
AI4CE
35
27
0
01 Jun 2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL
  Training
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
S. Tyagi
Martin Swany
25
4
0
20 May 2023
Efficient ConvBN Blocks for Transfer Learning and Beyond
Efficient ConvBN Blocks for Transfer Learning and Beyond
Kaichao You
Guo Qin
Anchang Bao
Mengsi Cao
Ping-Chia Huang
Jiulong Shan
Mingsheng Long
26
1
0
19 May 2023
mdctGAN: Taming transformer-based GAN for speech super-resolution with
  Modified DCT spectra
mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra
Chenhao Shuai
Chaohua Shi
Lu Gan
Hongqing Liu
25
8
0
18 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine
  Translation
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
22
0
0
10 May 2023
ArgU: A Controllable Factual Argument Generator
ArgU: A Controllable Factual Argument Generator
Sougata Saha
R. Srihari
22
13
0
09 May 2023
TASTY: A Transformer based Approach to Space and Time complexity
TASTY: A Transformer based Approach to Space and Time complexity
K. Moudgalya
Ankit Ramakrishnan
Vamsikrishna Chemudupati
Xinghai Lu
14
3
0
06 May 2023
ComGAN: Toward GANs Exploiting Multiple Samples
ComGAN: Toward GANs Exploiting Multiple Samples
Hae-Hwan Lee
GAN
18
0
0
24 Apr 2023
How Will It Drape Like? Capturing Fabric Mechanics from Depth Images
How Will It Drape Like? Capturing Fabric Mechanics from Depth Images
Carlos Rodriguez-Pardo
Melania Prieto-Martin
Dan Casas
Elena Garces
28
12
0
13 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via
  Word-Region Alignment
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
37
74
0
10 Apr 2023
HyperINR: A Fast and Predictive Hypernetwork for Implicit Neural
  Representations via Knowledge Distillation
HyperINR: A Fast and Predictive Hypernetwork for Implicit Neural Representations via Knowledge Distillation
Qi Wu
David Bauer
Yuyang Chen
Kwan-Liu Ma
31
14
0
09 Apr 2023
EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural
  Network Inference considering Approximate DRAMs for Embedded Systems
EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural Network Inference considering Approximate DRAMs for Embedded Systems
Rachmad Vidya Wicaksana Putra
Muhammad Abdullah Hanif
Muhammad Shafique
24
11
0
08 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
28
40
0
07 Apr 2023
Effective Theory of Transformers at Initialization
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
20
14
0
04 Apr 2023
ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
  Detection Box
ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang
Xing-Hui Wang
Xiaoqing Ye
Wei Zhang
Jincheng Lu
Xiao Tan
Errui Ding
Pei Sun
Jingdong Wang
VOT
29
20
0
27 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training
  Efficiency
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
28
3
0
21 Mar 2023
Rediscovering Hashed Random Projections for Efficient Quantization of
  Contextualized Sentence Embeddings
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings
Ulf A. Hamster
Ji-Ung Lee
Alexander Geyken
Iryna Gurevych
21
0
0
13 Mar 2023
One Neuron Saved Is One Neuron Earned: On Parametric Efficiency of
  Quadratic Networks
One Neuron Saved Is One Neuron Earned: On Parametric Efficiency of Quadratic Networks
Fenglei Fan
Hangcheng Dong
Zhongming Wu
Lecheng Ruan
T. Zeng
Yiming Cui
Jing-Xiao Liao
54
8
0
11 Mar 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in
  Tencent
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Xiaonan Nie
Yi Liu
Fangcheng Fu
J. Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Bin Cui
MoE
24
16
0
06 Mar 2023
Previous
1234567
Next