ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,255 papers shown
Title
Data Generation for Hardware-Friendly Post-Training Quantization
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
145
0
0
29 Oct 2024
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by
  Exploiting Temporal Continuity
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
K. Wang
Jieru Zhao
Shuo Yang
Wenchao Ding
M. Guo
25
0
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
76
5
0
28 Oct 2024
Improving Small-Scale Large Language Models Function Calling for
  Reasoning Tasks
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
Graziano A. Manduzio
Federico A. Galatolo
M. G. Cimino
Enzo Pasquale Scilingo
Lorenzo Cominelli
LRM
24
1
0
24 Oct 2024
Quantum Large Language Models via Tensor Network Disentanglers
Quantum Large Language Models via Tensor Network Disentanglers
Borja Aizpurua
S. Jahromi
Sukhbinder Singh
Roman Orus
33
3
0
22 Oct 2024
Remote Timing Attacks on Efficient Language Model Inference
Remote Timing Attacks on Efficient Language Model Inference
Nicholas Carlini
Milad Nasr
22
2
0
22 Oct 2024
Stacking Small Language Models for Generalizability
Stacking Small Language Models for Generalizability
Laurence Liang
LRM
16
0
0
21 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
9
0
17 Oct 2024
Large Language Models as Narrative-Driven Recommenders
Large Language Models as Narrative-Driven Recommenders
Lukas Eberhard
Thorsten Ruprechter
Denis Helic
LRM
24
0
0
17 Oct 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number
  Formats for Neural Networks
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
32
0
0
15 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason Eshraghian
32
1
0
12 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
W. Liu
Jun Yao
MQ
71
4
0
12 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Carlo Bosio
Mark W. Mueller
26
0
0
07 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of
  LLMs
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
21
1
0
06 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
82
18
0
03 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
23
0
0
30 Sep 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Yuanfeng Song
Di Jiang
39
2
0
29 Sep 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Matteo Carnelos
Francesco Pasti
Nicola Bellotto
18
1
0
28 Sep 2024
Analog In-Memory Computing Attention Mechanism for Fast and
  Energy-Efficient Large Language Models
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Nathan Leroux
Paul-Philipp Manea
Chirag Sudarshan
Jan Finkbeiner
Sebastian Siegel
J. Strachan
Emre Neftci
28
1
0
28 Sep 2024
A method of using RSVD in residual calculation of LowBit GEMM
A method of using RSVD in residual calculation of LowBit GEMM
Hongyaoxing Gu
MQ
35
0
0
27 Sep 2024
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on
  Mixed-Signal Accelerators
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators
Seyedarmin Azizi
Mohammad Erfan Sadeghi
M. Kamal
Massoud Pedram
17
2
0
27 Sep 2024
P4Q: Learning to Prompt for Quantization in Visual-language Models
P4Q: Learning to Prompt for Quantization in Visual-language Models
H. Sun
Runqi Wang
Yanjing Li
Xianbin Cao
Xiaolong Jiang
Yao Hu
Baochang Zhang
MQ
VLM
42
0
0
26 Sep 2024
Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
Artem Dementyev
Chandan K. A. Reddy
Scott Wisdom
Navin Chatlani
J. Hershey
R. Lyon
18
0
0
26 Sep 2024
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for
  Resource-Constrained Embedded Platforms
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms
Niraj Pudasaini
Muhammad Abdullah Hanif
Muhammad Shafique
26
0
0
22 Sep 2024
Bilateral Sharpness-Aware Minimization for Flatter Minima
Bilateral Sharpness-Aware Minimization for Flatter Minima
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Qingming Huang
AAML
110
0
0
20 Sep 2024
Less Memory Means smaller GPUs: Backpropagation with Compressed
  Activations
Less Memory Means smaller GPUs: Backpropagation with Compressed Activations
Daniel Barley
Holger Froning
35
0
0
18 Sep 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive
  Overview
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
34
3
0
18 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
27
2
0
14 Sep 2024
Efficient and Reliable Vector Similarity Search Using Asymmetric
  Encoding with NAND-Flash for Many-Class Few-Shot Learning
Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning
Hao-Wei Chiang
Chi-Tse Huang
Hsiang-Yun Cheng
P. Tseng
Ming-Hsiu Lee
An-Yeu
Wu
16
0
0
12 Sep 2024
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for
  PostNL
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL
Mohammad Reshadati
35
0
0
04 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight
  Quantization
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
40
1
0
03 Sep 2024
Evaluating the Performance of Large Language Models in Competitive
  Programming: A Multi-Year, Multi-Grade Analysis
Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis
Adrian Marius Dumitran
Adrian Catalin Badea
Stefan-Gabriel Muscalu
ELM
LRM
28
1
0
31 Aug 2024
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Chang Gao
J. Chen
Kang Zhao
Jiaqi Wang
Liping Jing
MQ
38
2
0
26 Aug 2024
Infrared Domain Adaptation with Zero-Shot Quantization
Infrared Domain Adaptation with Zero-Shot Quantization
Burak Sevsay
Erdem Akagündüz
VLM
MQ
30
1
0
25 Aug 2024
Practical token pruning for foundation models in few-shot conversational
  virtual assistant systems
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
20
0
0
21 Aug 2024
PEDAL: Enhancing Greedy Decoding with Large Language Models using
  Diverse Exemplars
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars
Sumanth Prabhu
34
1
0
16 Aug 2024
FourierKAN outperforms MLP on Text Classification Head Fine-tuning
FourierKAN outperforms MLP on Text Classification Head Fine-tuning
Abdullah Al Imran
Md Farhan Ishmam
VLM
16
1
0
16 Aug 2024
Computer Vision Model Compression Techniques for Embedded Systems: A
  Survey
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
28
5
0
15 Aug 2024
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with
  the Gemmini Accelerator
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator
Federico Nicolás Peccia
Svetlana Pavlitska
Tobias Fleck
Oliver Bringmann
20
0
0
14 Aug 2024
Large Investment Model
Large Investment Model
Jian Guo
H. Shum
AIFin
27
0
0
12 Aug 2024
Combining Neural Architecture Search and Automatic Code Optimization: A
  Survey
Combining Neural Architecture Search and Automatic Code Optimization: A Survey
Inas Bachiri
Hadjer Benmeziane
Smail Niar
Riyadh Baghdadi
Hamza Ouarnoughi
Abdelkrime Aries
40
0
0
07 Aug 2024
A Metric Driven Approach to Mixed Precision Training
A Metric Driven Approach to Mixed Precision Training
M. Rasquinha
Gil Tabak
25
0
0
06 Aug 2024
An approach to optimize inference of the DIART speaker diarization
  pipeline
An approach to optimize inference of the DIART speaker diarization pipeline
Roman Aperdannier
Sigurd Schacht
Alexander Piazza
37
0
0
05 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
29
0
0
01 Aug 2024
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power
  Wireless Acoustic Sensors
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
Zhaolan Huang
Adrien Tousnakhoff
Polina Kozyr
Roman Rehausen
Felix Biessmann
Robert Lachlan
C. Adjih
Emmanuel Baccelli
34
1
0
31 Jul 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione
Yue Zhao
Harry Yang
Ahmad Farhan
Fielding Johnston
James Buban
Patrick Colangelo
42
1
0
29 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
40
1
0
29 Jul 2024
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffM
MQ
49
0
0
28 Jul 2024
Mixed Non-linear Quantization for Vision Transformers
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
35
0
0
26 Jul 2024
Previous
123456...242526
Next