ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,255 papers shown
Title
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
Kai Liu
Haotong Qin
Yong Guo
Xin Yuan
Linghe Kong
Guihai Chen
Yulun Zhang
MQ
27
4
0
10 Jun 2024
Towards Lightweight Speaker Verification via Adaptive Neural Network
  Quantization
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Bei Liu
Haoyu Wang
Yanmin Qian
MQ
31
1
0
08 Jun 2024
Navigating Efficiency in MobileViT through Gaussian Process on Global
  Architecture Factors
Navigating Efficiency in MobileViT through Gaussian Process on Global Architecture Factors
Ke Meng
Kai Chen
32
0
0
07 Jun 2024
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
36
1
0
06 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
32
13
0
04 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
106
25
0
04 Jun 2024
TinySV: Speaker Verification in TinyML with On-device Learning
TinySV: Speaker Verification in TinyML with On-device Learning
Massimo Pavan
Gioele Mombelli
Francesco Sinacori
Manuel Roveri
43
2
0
03 Jun 2024
P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for
  Fully Quantized Vision Transformer
P2^22-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
40
3
0
30 May 2024
Exploiting LLM Quantization
Exploiting LLM Quantization
Kazuki Egashira
Mark Vero
Robin Staab
Jingxuan He
Martin Vechev
MQ
27
12
0
28 May 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with
  Metric-Decoupled Mixed Precision Quantization
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Tianchen Zhao
Xuefei Ning
Tongcheng Fang
En-hao Liu
Guyue Huang
Zinan Lin
Shengen Yan
Guohao Dai
Yu-Xiang Wang
MQ
DiffM
72
17
0
28 May 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit
  Large Language Models
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
36
7
0
28 May 2024
Extreme Compression of Adaptive Neural Images
Extreme Compression of Adaptive Neural Images
Leo Hoshikawa
Marcos V. Conde
Takeshi Ohashi
Atsushi Irie
42
1
0
27 May 2024
DAGER: Exact Gradient Inversion for Large Language Models
DAGER: Exact Gradient Inversion for Large Language Models
Ivo Petrov
Dimitar I. Dimitrov
Maximilian Baader
Mark Niklas Muller
Martin Vechev
FedML
55
3
0
24 May 2024
Thinking Forward: Memory-Efficient Federated Finetuning of Language
  Models
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Kunjal Panchal
Nisarg Parikh
Sunav Choudhary
Lijun Zhang
Yuriy Brun
Hui Guan
53
3
0
24 May 2024
BiSup: Bidirectional Quantization Error Suppression for Large Language
  Models
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
Minghui Zou
Ronghui Guo
Sai Zhang
Xiaowang Zhang
Zhiyong Feng
MQ
31
1
0
24 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
39
12
0
23 May 2024
Super Tiny Language Models
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
48
4
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
78
0
0
23 May 2024
Two Heads are Better Than One: Neural Networks Quantization with 2D
  Hilbert Curve-based Output Representation
Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Mykhail M. Uss
Ruslan Yermolenko
Olena Kolodiazhna
Oleksii Shashko
Ivan Safonov
Volodymyr Savin
Yoonjae Yeo
Seowon Ji
Jaeyun Jeong
MQ
27
0
0
22 May 2024
QGait: Toward Accurate Quantization for Gait Recognition with Binarized
  Input
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input
Senmao Tian
Haoyu Gao
Gangyi Hong
Shuyun Wang
JingJie Wang
Xin Yu
Shunli Zhang
MQ
34
1
0
22 May 2024
Nearest is Not Dearest: Towards Practical Defense against
  Quantization-conditioned Backdoor Attacks
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li
Yishuo Cai
Haowei Li
Feng Xue
Zhifeng Li
Yiming Li
MQ
AAML
29
20
0
21 May 2024
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for
  KV Cache Compression
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
Peiyu Liu
Zeming Gao
Wayne Xin Zhao
Yipeng Ma
Tao Wang
Ji-Rong Wen
MQ
29
5
0
21 May 2024
Can formal argumentative reasoning enhance LLMs performances?
Can formal argumentative reasoning enhance LLMs performances?
Federico Castagna
I. Sassoon
Simon Parsons
LRM
LLMAG
28
2
0
16 May 2024
Selective Focus: Investigating Semantics Sensitivity in Post-training
  Quantization for Lane Detection
Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection
Yunqian Fan
Xiuying Wei
Ruihao Gong
Yuqing Ma
Xiangguo Zhang
Qi Zhang
Xianglong Liu
MQ
27
2
0
10 May 2024
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity
  Allocation with Global Constraint in Minutes
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
Ruihao Gong
Yang Yong
Zining Wang
Jinyang Guo
Xiuying Wei
Yuqing Ma
Xianglong Liu
31
5
0
09 May 2024
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free
  Efficient Vision Transformer
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
Huihong Shi
Haikuo Shao
Wendong Mao
Zhongfeng Wang
ViT
MQ
36
3
0
06 May 2024
Neural Graphics Texture Compression Supporting Random Acces
Neural Graphics Texture Compression Supporting Random Acces
Farzad Farhadzadeh
Qiqi Hou
Hoang Le
Amir Said
Randall Rauwendaal
Alex Bourd
Fatih Porikli
29
0
0
06 May 2024
Collage: Light-Weight Low-Precision Strategy for LLM Training
Collage: Light-Weight Low-Precision Strategy for LLM Training
Tao Yu
Gaurav Gupta
Karthick Gopalswamy
Amith R. Mamidala
Hao Zhou
Jeffrey Huynh
Youngsuk Park
Ron Diamant
Anoop Deoras
Jun Huan
MQ
49
3
0
06 May 2024
PTQ4SAM: Post-Training Quantization for Segment Anything
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv
Hong Chen
Jinyang Guo
Yifu Ding
Xianglong Liu
VLM
MQ
31
13
0
06 May 2024
Optimising Calls to Large Language Models with Uncertainty-Based
  Two-Tier Selection
Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
Guillem Ramírez
Alexandra Birch
Ivan Titov
38
8
0
03 May 2024
Real-time multichannel deep speech enhancement in hearing aids:
  Comparing monaural and binaural processing in complex acoustic scenarios
Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios
Nils L. Westhausen
Hendrik Kayser
Theresa Jansen
Bernd T. Meyer
37
3
0
03 May 2024
TinySeg: Model Optimizing Framework for Image Segmentation on Tiny
  Embedded Systems
TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded Systems
Byungchul Chae
Jiae Kim
Seonyeong Heo
VLM
25
0
0
03 May 2024
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression
  and Deployment Toolkit for Prototype Hardware Accelerator Design
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Jian Meng
Yuan Liao
Anupreetham Anupreetham
Ahmed Hassan
Shixing Yu
Han-Sok Suh
Xiaofeng Hu
Jae-sun Seo
MQ
49
1
0
02 May 2024
CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot
  Applications
CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications
J. Blumenkamp
Steven D. Morad
Jennifer Gielis
Amanda Prorok
31
4
0
02 May 2024
Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for
  TinyML Person Detection
Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection
Colby R. Banbury
Emil Njor
Matthew P. Stewart
Pete Warden
M. Kudlur
Nat Jeffries
Xenofon Fafoutis
Vijay Janapa Reddi
VLM
42
0
0
01 May 2024
When Quantization Affects Confidence of Large Language Models?
When Quantization Affects Confidence of Large Language Models?
Irina Proskurina
Luc Brun
Guillaume Metzler
Julien Velcin
MQ
24
2
0
01 May 2024
Model Quantization and Hardware Acceleration for Vision Transformers: A
  Comprehensive Survey
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Dayou Du
Gu Gong
Xiaowen Chu
MQ
38
7
0
01 May 2024
Training-free Graph Neural Networks and the Power of Labels as Features
Training-free Graph Neural Networks and the Power of Labels as Features
Ryoma Sato
34
4
0
30 Apr 2024
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
Yufeng Yang
Adrian Kneip
Charlotte Frenkel
GNN
36
4
0
30 Apr 2024
Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised
  and Unsupervised Learning Approaches
Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches
Weiming Xu
Tao Yang
Peng Zhang
22
2
0
27 Apr 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
82
77
0
22 Apr 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
54
38
0
22 Apr 2024
EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven
  Generalized Converting Autoencoder
EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder
Hasanul Mahmud
Kevin Desai
P. Lama
Sushil Prasad
22
0
0
21 Apr 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model
  Acceleration
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
20
1
0
18 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
21
0
0
17 Apr 2024
Efficient and accurate neural field reconstruction using resistive
  memory
Efficient and accurate neural field reconstruction using resistive memory
Yifei Yu
Shaocong Wang
Woyu Zhang
Xinyuan Zhang
Xiuzhe Wu
...
Zhongrui Wang
Dashan Shang
Qi Liu
Kwang-Ting Cheng
Ming-Yu Liu
29
0
0
15 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
55
48
0
08 Apr 2024
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Jan Klhufek
Miroslav Safar
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
MQ
32
1
0
08 Apr 2024
Mitigating the Impact of Outlier Channels for Language Model
  Quantization with Activation Regularization
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Rameswar Panda
Yoon Kim
MQ
60
8
0
04 Apr 2024
Previous
123456...242526
Next