ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,255 papers shown
Title
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Chee Hong
Kyoung Mu Lee
SupR
MQ
26
2
0
04 Apr 2024
On the Surprising Efficacy of Distillation as an Alternative to
  Pre-Training Small Models
On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Sean Farhat
Deming Chen
37
0
0
04 Apr 2024
DNN Memory Footprint Reduction via Post-Training Intra-Layer
  Multi-Precision Quantization
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization
B. Ghavami
Amin Kamjoo
Lesley Shannon
S. Wilton
MQ
14
0
0
03 Apr 2024
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural
  Networks
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
...
Andrew G. Howard
Lukasz Lew
Sherief Reda
Ville Rautio
Daniele Moro
MQ
42
0
0
29 Mar 2024
Genetic Quantization-Aware Approximation for Non-Linear Operations in
  Transformers
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Pingcheng Dong
Yonghao Tan
Dong Zhang
Tianwei Ni
Xuejiao Liu
...
Xijie Huang
Huaiyu Zhu
Yun Pan
Fengwei An
Kwang-Ting Cheng
MQ
26
3
0
28 Mar 2024
QNCD: Quantization Noise Correction for Diffusion Models
QNCD: Quantization Noise Correction for Diffusion Models
Huanpeng Chu
Wei Wu
Chengjie Zang
Kun Yuan
DiffM
MQ
31
4
0
28 Mar 2024
Tiny Machine Learning: Progress and Futures
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
41
51
0
28 Mar 2024
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal
  Propagation Analysis for Large Language Models
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
Kartikeya Bhardwaj
N. Pandey
Sweta Priyadarshi
Kyunggeun Lee
Jun Ma
Harris Teague
MQ
40
2
0
26 Mar 2024
Are Compressed Language Models Less Subgroup Robust?
Are Compressed Language Models Less Subgroup Robust?
Leonidas Gee
Andrea Zugarini
Novi Quadrianto
33
1
0
26 Mar 2024
Systematic construction of continuous-time neural networks for linear
  dynamical systems
Systematic construction of continuous-time neural networks for linear dynamical systems
Chinmay Datar
Adwait Datar
Felix Dietrich
W. Schilders
AI4TS
36
1
0
24 Mar 2024
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
J. MathavRaj
VM Kushala
Harikrishna Warrier
Yogesh Gupta
19
32
0
23 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data
  Flow and Per-Block Quantization
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
37
20
0
19 Mar 2024
Adversarial Fine-tuning of Compressed Neural Networks for Joint
  Improvement of Robustness and Efficiency
Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency
Hallgrimur Thorsteinsson
Valdemar J Henriksen
Tong Chen
Raghavendra Selvan
AAML
35
1
0
14 Mar 2024
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
Aman Kumar
Khushboo Anand
Shubham Mandloi
Ashutosh Mishra
Avinash Thakur
Neeraj Kasera
Prathosh A P
22
3
0
13 Mar 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
19
5
0
12 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
24
1
0
11 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless
  Generative Inference of LLM
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
38
80
0
08 Mar 2024
The Impact of Quantization on the Robustness of Transformer-based Text
  Classifiers
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
Seyed Parsa Neshaei
Yasaman Boreshban
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
MQ
36
0
0
08 Mar 2024
Self-Adapting Large Visual-Language Models to Edge Devices across Visual
  Modalities
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai
Zhekai Duan
Gaowen Liu
Charles Fleming
Chris Xiaoxuan Lu
VLM
28
3
0
07 Mar 2024
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Hanlin Tang
Yifu Sun
Decheng Wu
Kai Liu
Jianchen Zhu
Zhanhui Kang
MQ
28
10
0
05 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
45
1
0
04 Mar 2024
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with
  Linear Quantization
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with Linear Quantization
Tianheng Ling
Julian Hoever
Chao Qian
Gregor Schiele
MQ
16
5
0
04 Mar 2024
BasedAI: A decentralized P2P network for Zero Knowledge Large Language
  Models (ZK-LLMs)
BasedAI: A decentralized P2P network for Zero Knowledge Large Language Models (ZK-LLMs)
Sean Wellington
21
5
0
01 Mar 2024
Resilience of Entropy Model in Distributed Neural Networks
Resilience of Entropy Model in Distributed Neural Networks
Milin Zhang
Mohammad Abdi
Shahriar Rifat
Francesco Restuccia
AAML
27
0
0
01 Mar 2024
Large Language Models and Games: A Survey and Roadmap
Large Language Models and Games: A Survey and Roadmap
Roberto Gallotta
Graham Todd
Marvin Zammit
Sam Earle
Antonios Liapis
Julian Togelius
Georgios N. Yannakakis
LLMAG
LM&MA
AI4CE
LRM
47
73
0
28 Feb 2024
FlattenQuant: Breaking Through the Inference Compute-bound for Large
  Language Models with Per-tensor Quantization
FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
Yi Zhang
Fei Yang
Shuang Peng
Fangyu Wang
Aimin Pan
MQ
24
2
0
28 Feb 2024
Understanding Neural Network Binarization with Forward and Backward
  Proximal Quantizers
Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
Yiwei Lu
Yaoliang Yu
Xinlin Li
Vahid Partovi Nia
MQ
30
3
0
27 Feb 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
28
1
0
27 Feb 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
30
31
0
26 Feb 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
Han Zou
Qiyang Zhao
Lina Bariah
Yu Tian
M. Bennis
S. Lasaulce
96
12
0
26 Feb 2024
Towards Accurate Post-training Quantization for Reparameterized Models
Towards Accurate Post-training Quantization for Reparameterized Models
Luoming Zhang
Yefei He
Wen Fei
Zhenyu Lou
Weijia Wu
YangWei Ying
Hong Zhou
MQ
35
0
0
25 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILM
LRM
23
6
0
23 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
48
24
0
21 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
37
38
0
19 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
29
1
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
47
0
15 Feb 2024
Graph Inference Acceleration by Learning MLPs on Graphs without
  Supervision
Graph Inference Acceleration by Learning MLPs on Graphs without Supervision
Zehong Wang
Zheyuan Zhang
Chuxu Zhang
Yanfang Ye
25
5
0
14 Feb 2024
Towards Meta-Pruning via Optimal Transport
Towards Meta-Pruning via Optimal Transport
Alexander Theus
Olin Geimer
Friedrich Wicke
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
MoMe
18
3
0
12 Feb 2024
Successive Refinement in Large-Scale Computation: Advancing Model
  Inference Applications
Successive Refinement in Large-Scale Computation: Advancing Model Inference Applications
H. Esfahanizadeh
Alejandro Cohen
S. Shamai
Muriel Médard
20
1
0
11 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
37
7
0
08 Feb 2024
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
Baohao Liao
Christian Herold
Shahram Khadivi
Christof Monz
CLL
MQ
42
12
0
07 Feb 2024
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang
Yangdong Liu
Haotong Qin
Ying Li
Shiming Zhang
Xianglong Liu
Michele Magno
Xiaojuan Qi
MQ
77
68
0
06 Feb 2024
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective
  Finetuning
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang
Yuzhang Shang
Zhihang Yuan
Junyi Wu
Yan Yan
DiffM
MQ
11
28
0
06 Feb 2024
Emergency Computing: An Adaptive Collaborative Inference Method Based on
  Hierarchical Reinforcement Learning
Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning
Weiqi Fu
Lianming Xu
Xin Wu
Li Wang
Aiguo Fei
16
0
0
03 Feb 2024
HW-SW Optimization of DNNs for Privacy-preserving People Counting on
  Low-resolution Infrared Arrays
HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays
Matteo Risso
Chen Xie
Francesco Daghero
Alessio Burrello
Seyedmorteza Mollaei
Marco Castellano
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
20
0
0
02 Feb 2024
Effective Multi-Stage Training Model For Edge Computing Devices In
  Intrusion Detection
Effective Multi-Stage Training Model For Edge Computing Devices In Intrusion Detection
Thua Huynh Trong
Thanh Nguyen Hoang
19
3
0
31 Jan 2024
Super Efficient Neural Network for Compression Artifacts Reduction and
  Super Resolution
Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution
Wen Ma
Qiuwen Lou
Arman Kazemi
Julian Faraone
Tariq Afzal
SupR
30
0
0
26 Jan 2024
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks
Haoze Wu
Omri Isac
Aleksandar Zeljić
Teruhiro Tagomori
M. Daggitt
...
Min Wu
Min Zhang
Ekaterina Komendantskaya
Guy Katz
Clark W. Barrett
45
30
0
25 Jan 2024
CompactifAI: Extreme Compression of Large Language Models using
  Quantum-Inspired Tensor Networks
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
Andrei Tomut
S. Jahromi
Abhijoy Sarkar
Uygar Kurt
Sukhbinder Singh
...
Muhammad Ibrahim
Oussama Tahiri-Alaoui
John Malcolm
Samuel Mugel
Roman Orus
MQ
42
13
0
25 Jan 2024
AdCorDA: Classifier Refinement via Adversarial Correction and Domain
  Adaptation
AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation
Lulan Shen
Ali Edalati
Brett H. Meyer
Warren Gross
James J. Clark
20
0
0
24 Jan 2024
Previous
123...567...242526
Next