Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17764
Cited By
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
27 February 2024
Shuming Ma
Hongyu Wang
Lingxiao Ma
Lei Wang
Wenhui Wang
Shaohan Huang
Lifeng Dong
Ruiping Wang
Jilong Xue
Furu Wei
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"
50 / 137 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
39
0
0
13 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
18
0
0
12 May 2025
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang
Jeff Huang
21
0
0
09 May 2025
PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs
Lukas Meiner
Jens Mehnert
A. P. Condurache
MQ
32
0
0
06 May 2025
Practical Boolean Backpropagation
Simon Golbert
14
0
0
01 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
60
0
0
01 May 2025
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
69
1
0
30 Apr 2025
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures
Miguel Nogales
Matteo Gambella
Manuel Roveri
56
0
0
29 Apr 2025
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Hongyu Wang
Shuming Ma
Furu Wei
MQ
48
1
0
25 Apr 2025
Compute-Optimal LLMs Provably Generalize Better With Scale
Marc Finzi
Sanyam Kapoor
Diego Granziol
Anming Gu
Christopher De Sa
J. Zico Kolter
Andrew Gordon Wilson
26
0
0
21 Apr 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
28
0
0
21 Apr 2025
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
37
0
0
18 Apr 2025
Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation
Owen Patterson
Chee Ng
29
0
0
12 Apr 2025
ML For Hardware Design Interpretability: Challenges and Opportunities
Raymond Baartmans
Andrew Ensinger
Victor Agostinelli
Lizhong Chen
29
0
0
11 Apr 2025
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
Hanqi Xiao
Yi-Lin Sung
Elias Stengel-Eskin
Mohit Bansal
MQ
31
0
0
10 Apr 2025
Ternarization of Vision Language Models for use on edge devices
Ben Crulis
Cyril de Runz
Barthélémy Serres
Gilles Venturini
VLM
55
0
0
07 Apr 2025
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning
Sanghwan Bae
Jiwoo Hong
Min Young Lee
Hanbyul Kim
Jeongyeon Nam
Donghyun Kwak
OffRL
LRM
48
3
0
04 Apr 2025
LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi
Mahsa Ardakani
Jinendra Malekar
Ramtin Zand
MQ
37
0
0
02 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
48
2
0
02 Apr 2025
PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs
Jinendra Malekar
Peyton S. Chandarana
Md Hasibul Amin
Mohammed E. Elbtity
Ramtin Zand
26
1
0
31 Mar 2025
STADE: Standard Deviation as a Pruning Metric
Diego Coello de Portugal Mecke
Haya Alyoussef
Ilia Koloiarov
Maximilian Stubbemann
Lars Schmidt-Thieme
24
0
0
28 Mar 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
H. Wang
VLM
MQ
86
0
0
20 Mar 2025
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
Ruichen Chen
Keith G. Mills
Di Niu
MQ
52
0
0
19 Mar 2025
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices
Yangyijian Liu
Jun Yu Li
Wu-Jun Li
29
0
0
15 Mar 2025
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge
Maximilian Abstreiter
Sasu Tarkoma
Roberto Morabito
44
0
0
12 Mar 2025
Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA
Nils Graef
Andrew Wasielewski
24
1
0
07 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
62
0
0
06 Mar 2025
TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition
Oliver Grainge
Michael Milford
I. Bodala
Sarvapali D. Ramchurn
Shoaib Ehsan
ViT
59
0
0
04 Mar 2025
Towards Lossless Implicit Neural Representation via Bit Plane Decomposition
Woo Kyoung Han
Byeonghun Lee
Hyunmin Cho
Sunghoon Im
Kyong Hwan Jin
MQ
85
0
0
28 Feb 2025
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
Tianhuang Su
Z. Yang
MQ
65
0
0
26 Feb 2025
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Sanghyun Yi
Qingfeng Liu
Mostafa El-Khamy
MQ
VGen
35
0
0
20 Feb 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
J. Zhao
Miao Zhang
M. Wang
Yuzhang Shang
Kaihao Zhang
Weili Guan
Yaowei Wang
Min Zhang
MQ
44
0
0
18 Feb 2025
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
J. Wang
Hansong Zhou
Ting Song
Shijie Cao
Yan Xia
Ting Cao
Jianyu Wei
Shuming Ma
Hongyu Wang
Furu Wei
56
0
0
17 Feb 2025
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Jacob Nielsen
Peter Schneider-Kamp
Lukas Galke
MQ
55
1
0
17 Feb 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
38
0
0
10 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
73
3
0
04 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
67
4
0
28 Jan 2025
It's complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act
Kristof Meding
FaML
62
1
0
22 Jan 2025
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Han Guo
William Brandon
Radostin Cholakov
Jonathan Ragan-Kelley
Eric P. Xing
Yoon Kim
MQ
79
12
0
20 Jan 2025
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Guoyu Li
Shengyu Ye
C. L. P. Chen
Yang Wang
Fan Yang
Ting Cao
Cheng Liu
Mohamed M. Sabry
Mao Yang
MQ
78
0
0
18 Jan 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
95
0
0
08 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Z. Wang
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQ
LRM
33
3
0
06 Jan 2025
Scaling Laws for Floating Point Quantization Training
X. Sun
Shuaipeng Li
Ruobing Xie
Weidong Han
Kan Wu
...
Yangyu Tao
Zhanhui Kang
C. Xu
Di Wang
Jie Jiang
MQ
AIFin
53
0
0
05 Jan 2025
A novel framework for MCDM based on Z numbers and soft likelihood function
Yuanpeng He
31
0
0
26 Dec 2024
TinyLLM: A Framework for Training and Deploying Language Models at the Edge Computers
Savitha Viswanadh Kandala
Pramuka Medaranga
Ambuj Varshney
70
1
0
19 Dec 2024
Code LLMs: A Taxonomy-based Survey
Nishat Raihan
Christian D. Newman
Marcos Zampieri
91
1
0
11 Dec 2024
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
Xu Ouyang
Tao Ge
Thomas Hartvigsen
Zhisong Zhang
Haitao Mi
Dong Yu
MQ
90
3
0
26 Nov 2024
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
Cristobal Ortega
Yann Falevoz
Renaud Ayrignac
76
1
0
26 Nov 2024
MH-MoE: Multi-Head Mixture-of-Experts
Shaohan Huang
Xun Wu
Shuming Ma
Furu Wei
MoE
64
1
0
25 Nov 2024
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
Yu Zhang
M. Wang
Lancheng Zou
Wulong Liu
Hui-Ling Zhen
M. Yuan
Bei Yu
MQ
74
1
0
25 Nov 2024
1
2
3
Next