Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05877
Cited By
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
50 / 1,255 papers shown
Title
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
145
0
0
29 Oct 2024
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
K. Wang
Jieru Zhao
Shuo Yang
Wenchao Ding
M. Guo
25
0
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
76
5
0
28 Oct 2024
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
Graziano A. Manduzio
Federico A. Galatolo
M. G. Cimino
Enzo Pasquale Scilingo
Lorenzo Cominelli
LRM
24
1
0
24 Oct 2024
Quantum Large Language Models via Tensor Network Disentanglers
Borja Aizpurua
S. Jahromi
Sukhbinder Singh
Roman Orus
33
3
0
22 Oct 2024
Remote Timing Attacks on Efficient Language Model Inference
Nicholas Carlini
Milad Nasr
22
2
0
22 Oct 2024
Stacking Small Language Models for Generalizability
Laurence Liang
LRM
16
0
0
21 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
9
0
17 Oct 2024
Large Language Models as Narrative-Driven Recommenders
Lukas Eberhard
Thorsten Ruprechter
Denis Helic
LRM
24
0
0
17 Oct 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
32
0
0
15 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason Eshraghian
32
1
0
12 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
W. Liu
Jun Yao
MQ
71
4
0
12 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Carlo Bosio
Mark W. Mueller
26
0
0
07 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
21
1
0
06 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
82
18
0
03 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
23
0
0
30 Sep 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Yuanfeng Song
Di Jiang
39
2
0
29 Sep 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Matteo Carnelos
Francesco Pasti
Nicola Bellotto
18
1
0
28 Sep 2024
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Nathan Leroux
Paul-Philipp Manea
Chirag Sudarshan
Jan Finkbeiner
Sebastian Siegel
J. Strachan
Emre Neftci
28
1
0
28 Sep 2024
A method of using RSVD in residual calculation of LowBit GEMM
Hongyaoxing Gu
MQ
35
0
0
27 Sep 2024
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators
Seyedarmin Azizi
Mohammad Erfan Sadeghi
M. Kamal
Massoud Pedram
17
2
0
27 Sep 2024
P4Q: Learning to Prompt for Quantization in Visual-language Models
H. Sun
Runqi Wang
Yanjing Li
Xianbin Cao
Xiaolong Jiang
Yao Hu
Baochang Zhang
MQ
VLM
42
0
0
26 Sep 2024
Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
Artem Dementyev
Chandan K. A. Reddy
Scott Wisdom
Navin Chatlani
J. Hershey
R. Lyon
18
0
0
26 Sep 2024
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms
Niraj Pudasaini
Muhammad Abdullah Hanif
Muhammad Shafique
26
0
0
22 Sep 2024
Bilateral Sharpness-Aware Minimization for Flatter Minima
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Qingming Huang
AAML
110
0
0
20 Sep 2024
Less Memory Means smaller GPUs: Backpropagation with Compressed Activations
Daniel Barley
Holger Froning
35
0
0
18 Sep 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
34
3
0
18 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
27
2
0
14 Sep 2024
Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning
Hao-Wei Chiang
Chi-Tse Huang
Hsiang-Yun Cheng
P. Tseng
Ming-Hsiu Lee
An-Yeu
Wu
16
0
0
12 Sep 2024
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL
Mohammad Reshadati
35
0
0
04 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
40
1
0
03 Sep 2024
Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis
Adrian Marius Dumitran
Adrian Catalin Badea
Stefan-Gabriel Muscalu
ELM
LRM
28
1
0
31 Aug 2024
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Chang Gao
J. Chen
Kang Zhao
Jiaqi Wang
Liping Jing
MQ
38
2
0
26 Aug 2024
Infrared Domain Adaptation with Zero-Shot Quantization
Burak Sevsay
Erdem Akagündüz
VLM
MQ
30
1
0
25 Aug 2024
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
20
0
0
21 Aug 2024
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars
Sumanth Prabhu
34
1
0
16 Aug 2024
FourierKAN outperforms MLP on Text Classification Head Fine-tuning
Abdullah Al Imran
Md Farhan Ishmam
VLM
16
1
0
16 Aug 2024
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
28
5
0
15 Aug 2024
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator
Federico Nicolás Peccia
Svetlana Pavlitska
Tobias Fleck
Oliver Bringmann
20
0
0
14 Aug 2024
Large Investment Model
Jian Guo
H. Shum
AIFin
27
0
0
12 Aug 2024
Combining Neural Architecture Search and Automatic Code Optimization: A Survey
Inas Bachiri
Hadjer Benmeziane
Smail Niar
Riyadh Baghdadi
Hamza Ouarnoughi
Abdelkrime Aries
40
0
0
07 Aug 2024
A Metric Driven Approach to Mixed Precision Training
M. Rasquinha
Gil Tabak
25
0
0
06 Aug 2024
An approach to optimize inference of the DIART speaker diarization pipeline
Roman Aperdannier
Sigurd Schacht
Alexander Piazza
37
0
0
05 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
29
0
0
01 Aug 2024
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
Zhaolan Huang
Adrien Tousnakhoff
Polina Kozyr
Roman Rehausen
Felix Biessmann
Robert Lachlan
C. Adjih
Emmanuel Baccelli
34
1
0
31 Jul 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione
Yue Zhao
Harry Yang
Ahmad Farhan
Fielding Johnston
James Buban
Patrick Colangelo
42
1
0
29 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
40
1
0
29 Jul 2024
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffM
MQ
49
0
0
28 Jul 2024
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
35
0
0
26 Jul 2024
Previous
1
2
3
4
5
6
...
24
25
26
Next