ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,255 papers shown
Title
Comprehensive Study on Performance Evaluation and Optimization of Model
  Compression: Bridging Traditional Deep Learning and Large Language Models
Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models
Aayush Saxena
Arit Kumar Bishwas
Ayush Ashok Mishra
Ryan Armstrong
19
1
0
22 Jul 2024
Inverted Activations
Inverted Activations
Georgii Sergeevich Novikov
Ivan V. Oseledets
21
0
0
22 Jul 2024
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
Hazem Hesham Yousef Shalby
Massimo Pavan
Manuel Roveri
37
0
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
45
0
0
22 Jul 2024
Toward Efficient Convolutional Neural Networks With Structured Ternary
  Patterns
Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns
Christos Kyrkou
34
0
0
20 Jul 2024
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for
  Multi-Pumped Soft SIMD Operations
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations
Giorgos Armeniakos
Alexis Maras
S. Xydis
Dimitrios Soudris
MQ
21
3
0
19 Jul 2024
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Alessandro Pierro
Steven Abreu
MQ
Mamba
43
6
0
17 Jul 2024
NITRO-D: Native Integer-only Training of Deep Convolutional Neural
  Networks
NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks
Alberto Pirillo
Luca Colombo
Manuel Roveri
MQ
29
0
0
16 Jul 2024
QVD: Post-training Quantization for Video Diffusion Models
QVD: Post-training Quantization for Video Diffusion Models
Shilong Tian
Hong Chen
Chengtao Lv
Yu Liu
Jinyang Guo
Xianglong Liu
Shengxi Li
Hao Yang
Tao Xie
VGen
MQ
46
2
0
16 Jul 2024
On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M
  Microcontrollers
On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers
M. Deutel
Frank Hannig
Christopher Mutschler
Jürgen Teich
MQ
25
0
0
15 Jul 2024
A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m
  Predictions per Second
A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m Predictions per Second
Blaž Škrlj
Benjamin Ben-Shalom
Grega Gaspersic
Adi Schwartz
Ramzi Hoseisi
Naama Ziporin
Davorin Kopic
Andraz Tori
37
0
0
14 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
39
3
0
12 Jul 2024
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
26
1
0
12 Jul 2024
Real-Time Anomaly Detection and Reactive Planning with Large Language
  Models
Real-Time Anomaly Detection and Reactive Planning with Large Language Models
Rohan Sinha
Amine Elhafsi
Christopher Agia
Matthew Foutter
Edward Schmerling
Marco Pavone
OffRL
LRM
43
26
0
11 Jul 2024
DεpS: Delayed ε-Shrinking for Faster Once-For-All
  Training
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala
Alind Khare
Animesh Agrawal
Igor Fedorov
Hugo Latapie
Myungjin Lee
Alexey Tumanov
CLL
42
0
0
08 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
55
14
0
08 Jul 2024
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
Jingyang Xiang
Zuohui Chen
Siqi Li
Qing Wu
Yong-Jin Liu
26
1
0
07 Jul 2024
ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with
  Deliberately Quantized Parameters
ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters
B. Ghavami
M. Shahidzadeh
Lesley Shannon
S. Wilton
43
0
0
06 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning
  Models
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
34
1
0
05 Jul 2024
Resource-Efficient Speech Quality Prediction through Quantization Aware
  Training and Binary Activation Maps
Resource-Efficient Speech Quality Prediction through Quantization Aware Training and Binary Activation Maps
Mattias Nilsson
Riccardo Miccini
Clément Laroche
Tobias Piechowiak
Friedemann Zenke
MQ
26
0
0
05 Jul 2024
ISQuant: apply squant to the real deployment
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
19
0
0
05 Jul 2024
Gaussian Eigen Models for Human Heads
Gaussian Eigen Models for Human Heads
Wojciech Zielonka
Timo Bolkart
Thabo Beeler
Justus Thies
3DGS
49
5
0
05 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffM
MQ
43
5
0
04 Jul 2024
Protecting Deep Learning Model Copyrights with Adversarial Example-Free
  Reuse Detection
Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Xiaokun Luan
Xiyue Zhang
Jingyi Wang
Meng Sun
AAML
20
0
0
04 Jul 2024
Improving Conversational Abilities of Quantized Large Language Models
  via Direct Preference Alignment
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Janghwan Lee
Seongmin Park
S. Hong
Minsoo Kim
Du-Seong Chang
Jungwook Choi
29
4
0
03 Jul 2024
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization
  for Vision Transformers
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
Yanfeng Jiang
Ning Sun
Xueshuo Xie
Fei Yang
Tao Li
MQ
36
2
0
03 Jul 2024
CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models
  using Data Fusion in Financial Applications
CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications
Yupeng Cao
Zhiyuan Yao
Zhi Chen
Zhiyang Deng
26
1
0
02 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedIm
DiffM
41
1
0
01 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Alessio Burrello
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
MQ
50
2
0
01 Jul 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
45
0
0
29 Jun 2024
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for
  Uncertainty-Aware Dynamic Navigation
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for Uncertainty-Aware Dynamic Navigation
Zhanteng Xie
P. Dames
39
1
0
28 Jun 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
38
2
0
27 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
26
23
0
25 Jun 2024
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
Yiran Luo
Het Patel
Yu Fu
Dawon Ahn
Jia Chen
Yue Dong
Evangelos E. Papalexakis
38
1
0
25 Jun 2024
Evaluation of Language Models in the Medical Context Under
  Resource-Constrained Settings
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Andrea Posada
Daniel Rueckert
Felix Meissen
Philip Muller
LM&MA
ELM
31
0
0
24 Jun 2024
Compensate Quantization Errors: Make Weights Hierarchical to Compensate
  Each Other
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Yifei Gao
Jie Ou
Lei Wang
Yuting Xiao
Zhiyuan Xiang
Ruiting Dai
Jun Cheng
MQ
36
3
0
24 Jun 2024
MetaGreen: Meta-Learning Inspired Transformer Selection for Green
  Semantic Communication
MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication
Shubhabrata Mukherjee
Cory Beard
Sejun Song
43
0
0
22 Jun 2024
Prefixing Attention Sinks can Mitigate Activation Outliers for Large
  Language Model Quantization
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
Seungwoo Son
Wonpyo Park
Woohyun Han
Kyuyeun Kim
Jaeho Lee
MQ
32
10
0
17 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language
  Models via Cycle Block Gradient Descent
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
37
1
0
17 Jun 2024
InstructCMP: Length Control in Sentence Compression through
  Instruction-based Large Language Models
InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models
Juseon-Do
Jingun Kwon
Hidetaka Kamigaito
Manabu Okumura
26
2
0
16 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
Tender: Accelerating Large Language Models via Tensor Decomposition and
  Runtime Requantization
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
Jungi Lee
Wonbeom Lee
Jaewoong Sim
MQ
29
14
0
16 Jun 2024
Memory Faults in Activation-sparse Quantized Deep Neural Networks:
  Analysis and Mitigation using Sharpness-aware Training
Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training
Akul Malhotra
S. Gupta
13
0
0
15 Jun 2024
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with
  Knowledge Distillation
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
Injoon Hwang
Haewon Park
Youngwan Lee
Jooyoung Yang
SunJae Maeng
AI4CE
16
0
0
13 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Ruihao Gong
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
37
0
0
13 Jun 2024
Memory Is All You Need: An Overview of Compute-in-Memory Architectures
  for Accelerating Large Language Model Inference
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
Christopher Wolters
Xiaoxuan Yang
Ulf Schlichtmann
Toyotaro Suzumura
36
11
0
12 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
66
1
0
12 Jun 2024
Markov Constraint as Large Language Model Surrogate
Markov Constraint as Large Language Model Surrogate
Alexandre Bonlarron
Jean-Charles Régin
32
1
0
11 Jun 2024
Embedded Graph Convolutional Networks for Real-Time Event Data
  Processing on SoC FPGAs
Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs
K. Jeziorek
Piotr Wzorek
Krzysztof Blachut
Andrea Pinna
T. Kryjak
GNN
34
4
0
11 Jun 2024
TernaryLLM: Ternarized Large Language Model
TernaryLLM: Ternarized Large Language Model
Tianqi Chen
Zhe Li
Weixiang Xu
Zeyu Zhu
Dong Li
Lu Tian
E. Barsoum
Peisong Wang
Jian Cheng
34
7
0
11 Jun 2024
Previous
12345...242526
Next