ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,255 papers shown
Title
Robustness to distribution shifts of compressed networks for edge
  devices
Robustness to distribution shifts of compressed networks for edge devices
Lulan Shen
Ali Edalati
Brett H. Meyer
Warren Gross
James J. Clark
33
0
0
22 Jan 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
61
5
0
22 Jan 2024
Dynamic Q&A of Clinical Documents with Large Language Models
Dynamic Q&A of Clinical Documents with Large Language Models
Ran Elgedawy
Ioana Danciu
Maria Mahbub
Sudarshan Srinivasan
LM&MA
27
5
0
19 Jan 2024
A2Q+: Improving Accumulator-Aware Weight Quantization
A2Q+: Improving Accumulator-Aware Weight Quantization
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
Yaman Umuroglu
MQ
21
4
0
19 Jan 2024
Model Compression Techniques in Biometrics Applications: A Survey
Model Compression Techniques in Biometrics Applications: A Survey
Eduarda Caldeira
Pedro C. Neto
Marco Huber
Naser Damer
Ana F. Sequeira
34
11
0
18 Jan 2024
Enabling On-device Continual Learning with Binary Neural Networks
Enabling On-device Continual Learning with Binary Neural Networks
Lorenzo Vorabbi
Davide Maltoni
Guido Borghi
Stefano Santi
MQ
33
5
0
18 Jan 2024
Efficient and Mathematically Robust Operations for Certified Neural
  Networks Inference
Efficient and Mathematically Robust Operations for Certified Neural Networks Inference
Fabien Geyer
Johannes Freitag
Tobias Schulz
Sascha Uhrig
14
1
0
16 Jan 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for
  Large-scale DNN Training with Virtual Memory Stitching
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
78
10
0
16 Jan 2024
Convolutional Neural Network Compression via Dynamic Parameter Rank
  Pruning
Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning
Manish Sharma
Jamison Heard
Eli Saber
Panos P. Markopoulos
28
1
0
15 Jan 2024
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory
  Architectures
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures
Rebecca Pelke
José Cubero-Cascante
Nils Bosbach
Felix Staudigl
Rainer Leupers
Jan Moritz Joseph
21
0
0
15 Jan 2024
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Hyogon Ryu
Seohyun Lim
Hyunjung Shim
DiffM
MQ
27
5
0
09 Jan 2024
Safety and Performance, Why Not Both? Bi-Objective Optimized Model
  Compression against Heterogeneous Attacks Toward AI Software Deployment
Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment
Jie Zhu
Leye Wang
Xiao Han
Anmin Liu
Tao Xie
AAML
25
5
0
02 Jan 2024
A Reliable Knowledge Processing Framework for Combustion Science using
  Foundation Models
A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models
Vansh Sharma
Venkat Raman
21
7
0
31 Dec 2023
Adaptive Depth Networks with Skippable Sub-Paths
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
28
1
0
27 Dec 2023
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
  Quantization
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
33
10
0
23 Dec 2023
Towards Efficient Verification of Quantized Neural Networks
Towards Efficient Verification of Quantized Neural Networks
Pei Huang
Haoze Wu
Yuting Yang
Ieva Daukantas
Min Wu
Yedi Zhang
Clark W. Barrett
MQ
30
12
0
20 Dec 2023
Optimizing Convolutional Neural Network Architecture
Optimizing Convolutional Neural Network Architecture
Luis Balderas
Miguel Lastra
José M. Benítez
CVBM
17
4
0
17 Dec 2023
Post-Training Quantization for Re-parameterization via Coarse & Fine
  Weight Splitting
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting
Dawei Yang
Ning He
Xing Hu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
MQ
25
5
0
17 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
26
9
0
13 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
34
13
0
13 Dec 2023
FP8-BERT: Post-Training Quantization for Transformer
FP8-BERT: Post-Training Quantization for Transformer
Jianwei Li
Tianchi Zhang
Ian En-Hsu Yen
Dongkuan Xu
MQ
15
5
0
10 Dec 2023
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen
Peiyan Dong
Lei Lu
Zhenglun Kong
Zhengang Li
Ming Lin
Chao Wu
Yanzhi Wang
MQ
39
24
0
09 Dec 2023
SmoothQuant+: Accurate and Efficient 4-bit Post-Training
  WeightQuantization for LLM
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Jiayi Pan
Chengcan Wang
Kaifu Zheng
Yangguang Li
Zhenyu Wang
Bin Feng
MQ
35
7
0
06 Dec 2023
MoEC: Mixture of Experts Implicit Neural Compression
MoEC: Mixture of Experts Implicit Neural Compression
Jianchen Zhao
Cheng-Ching Tseng
Ming Lu
Ruichuan An
Xiaobao Wei
He Sun
Shanghang Zhang
16
3
0
03 Dec 2023
Mixed-Precision Quantization for Federated Learning on
  Resource-Constrained Heterogeneous Devices
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen
H. Vikalo
FedML
MQ
16
7
0
29 Nov 2023
A Survey on Design Methodologies for Accelerating Deep Learning on
  Heterogeneous Architectures
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Fabrizio Ferrandi
S. Curzel
Leandro Fiorin
Daniele Ielmini
Cristina Silvano
...
Salvatore Filippone
F. L. Presti
Francesco Silvestri
P. Palazzari
Stefania Perri
21
4
0
29 Nov 2023
Mirage: An RNS-Based Photonic Accelerator for DNN Training
Mirage: An RNS-Based Photonic Accelerator for DNN Training
Cansu Demirkıran
Guowei Yang
D. Bunandar
Ajay Joshi
26
1
0
29 Nov 2023
LayerCollapse: Adaptive compression of neural networks
LayerCollapse: Adaptive compression of neural networks
Soheil Zibakhsh Shabgahi
Mohammad Soheil Shariff
F. Koushanfar
AI4CE
18
1
0
29 Nov 2023
PIPE : Parallelized Inference Through Post-Training Quantization
  Ensembling of Residual Expansions
PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
15
0
0
27 Nov 2023
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang
Ruihao Gong
Jing Liu
Tianlong Chen
Xianglong Liu
DiffM
MQ
22
37
0
27 Nov 2023
Fast Inner-Product Algorithms and Architectures for Deep Neural Network
  Accelerators
Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators
Trevor E. Pogue
N. Nicolici
22
3
0
20 Nov 2023
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive
  Review
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review
M. Lê
Pierre Wolinski
Julyan Arbel
34
8
0
20 Nov 2023
LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded
  Computing Platforms
LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms
Young D. Kwon
Jagmohan Chauhan
Hong Jia
Stylianos I. Venieris
Cecilia Mascolo
38
11
0
19 Nov 2023
Low-Precision Floating-Point for Efficient On-Board Deep Neural Network
  Processing
Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing
Cédric Gernigon
Silviu-Ioan Filip
Olivier Sentieys
Clément Coggiola
Mickael Bruno
MQ
19
7
0
18 Nov 2023
LightBTSeg: A lightweight breast tumor segmentation model using
  ultrasound images via dual-path joint knowledge distillation
LightBTSeg: A lightweight breast tumor segmentation model using ultrasound images via dual-path joint knowledge distillation
Hongjiang Guo
Shengwen Wang
Hao Dang
Kangle Xiao
Yaru Yang
Wenpei Liu
Tongtong Liu
Yiying Wan
14
2
0
18 Nov 2023
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Simon Niedermayr
Josef Stumpfegger
Rüdiger Westermann
3DGS
33
106
0
17 Nov 2023
Quantized Distillation: Optimizing Driver Activity Recognition Models
  for Resource-Constrained Environments
Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments
Calvin Tanama
Kunyu Peng
Zdravko Marinov
Rainer Stiefelhagen
Alina Roitberg
17
1
0
10 Nov 2023
Exploiting Neural-Network Statistics for Low-Power DNN Inference
Exploiting Neural-Network Statistics for Low-Power DNN Inference
Lennart Bamberg
Ardalan Najafi
Alberto García-Ortiz
19
0
0
09 Nov 2023
RepQ: Generalizing Quantization-Aware Training for Re-Parametrized
  Architectures
RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures
Anastasiia Prutianova
Alexey Zaytsev
Chung-Kuei Lee
Fengyu Sun
Ivan Koryakovskiy
MQ
13
0
0
09 Nov 2023
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO
  Networks
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks
Kartik Gupta
Akshay Asthana
MQ
24
8
0
09 Nov 2023
Beyond Size: How Gradients Shape Pruning Decisions in Large Language
  Models
Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models
Rocktim Jyoti Das
Mingjie Sun
Liqun Ma
Zhiqiang Shen
VLM
13
13
0
08 Nov 2023
A Lightweight Architecture for Real-Time Neuronal-Spike Classification
A Lightweight Architecture for Real-Time Neuronal-Spike Classification
Muhammad Ali Siddiqi
David Vrijenhoek
Lennart P L Landsmeer
Job van der Kleij
A. Gebregiorgis
V. Romano
R. Bishnoi
Said Hamdioui
Christos Strydis
13
1
0
08 Nov 2023
AFPQ: Asymmetric Floating Point Quantization for LLMs
AFPQ: Asymmetric Floating Point Quantization for LLMs
Yijia Zhang
Sicheng Zhang
Shijie Cao
Dayou Du
Jianyu Wei
Ting Cao
Ningyi Xu
MQ
22
5
0
03 Nov 2023
Effective Quantization for Diffusion Models on CPUs
Effective Quantization for Diffusion Models on CPUs
Hanwen Chang
Haihao Shen
Yiyang Cai
Xinyu. Ye
Zhenzhong Xu
Wenhua Cheng
Kaokao Lv
Weiwei Zhang
Yintong Lu
Heng Guo
MQ
27
7
0
02 Nov 2023
Fully Quantized Always-on Face Detector Considering Mobile Image Sensors
Fully Quantized Always-on Face Detector Considering Mobile Image Sensors
Haechang Lee
Wongi Jeong
Dongil Ryu
Hyunwoo Je
Albert No
Kijeong Kim
Se Young Chun
CVBM
29
0
0
02 Nov 2023
Efficient LLM Inference on CPUs
Efficient LLM Inference on CPUs
Haihao Shen
Hanwen Chang
Bo Dong
Yu Luo
Hengyu Meng
MQ
15
17
0
01 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
  Labelling
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
19
51
0
01 Nov 2023
Compression with Exact Error Distribution for Federated Learning
Compression with Exact Error Distribution for Federated Learning
Mahmoud Hegazy
Rémi Leluc
Cheuk Ting Li
Aymeric Dieuleveut
FedML
13
9
0
31 Oct 2023
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices
  Environments
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments
Mert Unsal
Ali Maatouk
Antonio De Domenico
Nicola Piovesan
Fadhel Ayed
11
0
0
31 Oct 2023
Efficient IoT Inference via Context-Awareness
Efficient IoT Inference via Context-Awareness
Mohammad Mehdi Rastikerdar
Jin Huang
Shiwei Fang
Hui Guan
Deepak Ganesan
18
0
0
29 Oct 2023
Previous
123...678...242526
Next