Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.03740
Cited By
Mixed Precision Training
10 October 2017
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
David García
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mixed Precision Training"
50 / 265 papers shown
Title
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via
D
\mathbf{\texttt{D}}
D
ual-
H
\mathbf{\texttt{H}}
H
ead
O
\mathbf{\texttt{O}}
O
ptimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
43
0
0
12 May 2025
CogniSNN: A First Exploration to Random Graph Architecture based Spiking Neural Networks with Enhanced Expandability and Neuroplasticity
Yongsheng Huang
Peibo Duan
Zhipeng Liu
Kai Sun
Changsheng Zhang
Bin Zhang
Mingkun Xu
GNN
45
0
0
09 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
K. Zhang
Lizhuang Ma
J. Wang
J. Wang
W. Zhang
MQ
51
0
0
01 May 2025
RayZer: A Self-supervised Large View Synthesis Model
Hanwen Jiang
Hao Tan
Peng Wang
Haian Jin
Yue Zhao
...
Kai Zhang
Fujun Luan
Kalyan Sunkavalli
Qixing Huang
Georgios Pavlakos
62
0
0
01 May 2025
Trends in AI Supercomputers
Konstantin Pilz
James Sanders
Robi Rahman
Lennart Heim
GNN
ELM
29
0
0
22 Apr 2025
Pychop: Emulating Low-Precision Arithmetic in Numerical Methods and Neural Networks
Erin Carson
Xinye Chen
49
0
0
10 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
47
1
0
05 Apr 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
1
0
16 Mar 2025
MSConv: Multiplicative and Subtractive Convolution for Face Recognition
Si Zhou
Yain-Whar Si
Xiaochen Yuan
Xiaofan Li
Xiaoxiang Liu
Xinyuan Zhang
Cong Lin
Xueyuan Gong
CVBM
73
0
0
08 Mar 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
35
0
0
24 Feb 2025
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
G. Klambauer
Razvan Pascanu
Sepp Hochreiter
73
4
0
21 Feb 2025
An Efficient Row-Based Sparse Fine-Tuning
Cen-Jhih Li
Aditya Bhaskara
52
0
0
17 Feb 2025
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Haonan He
Peng Ye
Yuchen Ren
Yuan Yuan
Lei Chen
AI4TS
AI4CE
134
0
0
13 Feb 2025
DejAIvu: Identifying and Explaining AI Art on the Web in Real-Time with Saliency Maps
Jocelyn Dzuong
83
0
0
12 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
45
0
0
10 Feb 2025
DAGNet: A Dual-View Attention-Guided Network for Efficient X-ray Security Inspection
Shilong Hong
Yanzhou Zhou
Weichao Xu
73
0
0
03 Feb 2025
Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models
Minghan Li
Eric Gaussier
Guodong Zhou
RALM
63
0
0
28 Jan 2025
Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
Ahmad Süleyman
Göksel Biricik
43
2
0
15 Jan 2025
EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition
Yassine El Boudouri
Amine Bohi
71
15
0
14 Jan 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
67
2
0
14 Jan 2025
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang
Junli Cao
Vidit Goel
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
128
11
0
16 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
83
1
0
26 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
154
3
0
20 Nov 2024
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi
Idan Kashani
A. Mendelson
Chaim Baskin
LLMSV
23
0
0
15 Nov 2024
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Nasib Ullah
Erik Schultheis
Mike Lasby
Yani Andrew Ioannou
Rohit Babbar
33
0
0
05 Nov 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Y. Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
63
9
0
25 Oct 2024
CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
42
0
0
20 Oct 2024
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu
Yashan Wang
Ruibin Yuan
Zhancheng Guo
Xu Tan
...
Yuanliang Dong
Jiafeng Liu
Xiaobing Li
Feng Yu
Maosong Sun
28
3
0
17 Oct 2024
Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting
Chunlin Tian
Li Li
Kahou Tam
Yebo Wu
Chengzhong Xu
FedML
24
1
0
12 Oct 2024
Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare
Pardis Sadat Zahraei
Zahra Shakeri
LM&MA
21
0
0
09 Oct 2024
On Importance of Pruning and Distillation for Efficient Low Resource NLP
Aishwarya Mirashi
Purva Lingayat
Srushti Sonavane
Tejas Padhiyar
Raviraj Joshi
Geetanjali Kale
21
1
0
21 Sep 2024
DeMansia: Mamba Never Forgets Any Tokens
Ricky Fang
Mamba
19
0
0
04 Aug 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
51
9
0
24 Jul 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
32
1
0
24 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
28
0
0
12 Jun 2024
Sustainable self-supervised learning for speech representations
Luis Lugo
Valentin Vielzeuf
29
2
0
11 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
34
5
0
31 May 2024
Improving the Training of Rectified Flows
Sangyun Lee
Zinan Lin
Giulia Fanti
34
19
0
30 May 2024
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
Abbas Ghaddar
David Alfonso-Hermelo
Philippe Langlais
Mehdi Rezagholizadeh
Boxing Chen
Prasanna Parthasarathi
34
0
0
24 May 2024
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
37
2
0
23 May 2024
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems
Kailash Gogineni
Sai Santosh Dayapule
Juan Gómez Luna
Karthikeya Gogineni
Peng Wei
Tian-Shing Lan
Mohammad Sadrosadati
Onur Mutlu
Guru Venkataramani
42
10
0
07 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
81
1
0
30 Apr 2024
A Multi-Level Framework for Accelerating Training Transformer Models
Longwei Zou
Han Zhang
Yangdong Deng
AI4CE
32
1
0
07 Apr 2024
Accurate Block Quantization in LLMs with Outliers
Nikita Trukhanov
I. Soloveychik
MQ
24
3
0
29 Mar 2024
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim
Kyungdon Joo
3DPC
3DV
35
1
0
08 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
40
1
0
04 Mar 2024
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Sunghyeon Woo
Baeseong Park
Byeongwook Kim
Minjung Jo
S. Kwon
Dongsuk Jeon
Dongsoo Lee
57
2
0
27 Feb 2024
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel
M. Armando
Salma Galaaoui
Romain Brégier
Philippe Weinzaepfel
Grégory Rogez
Thomas Lucas
3DH
33
18
0
22 Feb 2024
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
M. A. D. L. Balaguer
Vinamra Benara
Renato Luiz de Freitas Cunha
Roberto de M. Estevao Filho
Todd Hendry
...
Morris Sharp
B. Silva
Swati Sharma
Vijay Aski
Ranveer Chandra
FaML
25
79
0
16 Jan 2024
1
2
3
4
5
6
Next