v1v2v3 (latest)

OneBit: Towards Extremely Low-bit Large Language Models

17 February 2024

Xu Han

Shuo Wang

Zhiyuan Liu

ArXiv (abs)PDF HTML HuggingFace (25 upvotes)Github (204★)

Papers citing "OneBit: Towards Extremely Low-bit Large Language Models"

28 / 28 papers shown

R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization

210

21 Nov 2025

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Vladimír Macko

Vladimír Boža

167

17 Nov 2025

Energy-Efficient and Dequantization-Free Q-LLMs: A Spiking Neural Network Approach to Salient Value Mitigation

203

22 Oct 2025

^2

-LLM: Post-Training Ternarization for Large Language Models

314

27 Sep 2025

SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size

138

27 Sep 2025

APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2025

170

26 Aug 2025

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

182

09 Aug 2025

SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Xiangchen Li

Dimitrios Spatharakis

Saeid Ghafouri

Jiakun Fan

Dimitrios Nikolopoulos

Deepu John

Bo Ji

Dimitrios S. Nikolopoulos

493

11 Jun 2025

MiniCPM4: Ultra-Efficient LLMs on End Devices

...

363

09 Jun 2025

MANBench: Is Your Multimodal Model Smarter than Human?Annual Meeting of the Association for Computational Linguistics (ACL), 2025

281

04 Jun 2025

LittleBit: Ultra Low-Bit Quantization via Latent Factorization

349

30 May 2025

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

Ba-Hien Tran

Van Minh Nguyen

552

28 May 2025

Addition is almost all you need: Compressing large language models with double binary factorization

Vladimír Boža

Vladimír Macko

590

16 May 2025

Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial RetrainingModeling Decisions for Artificial Intelligence (MDAI), 2025

Deyu Cao

Samin Aref

537

14 Apr 2025

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai

Yuma Ichikawa

620

13 Apr 2025

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

661

02 Apr 2025

Dynamic Low-Rank Sparse Adaptation for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025

499

21 Feb 2025

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

777

18 Feb 2025

Progressive Binarization with Semi-Structured Pruning for LLMs

692

03 Feb 2025

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data FormatInternational Symposium on High-Performance Computer Architecture (HPCA), 2024

503

24 Nov 2024

Bi-Mamba: Towards Accurate 1-Bit State Space Models

420

18 Nov 2024

Inverted Activations

Georgii Sergeevich Novikov

Ivan Oseledets

181

22 Jul 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

674

112

10 Jul 2024

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Liqun Ma

Mingjie Sun

Zhiqiang Shen

295

09 Jul 2024

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

502

25 Jun 2024

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024

Xin Wang

Yu Zheng

Zhongwei Wan

Mi Zhang

635

207

12 Mar 2024

A Survey on Trustworthy Edge Intelligence: From Security and Reliability To Transparency and SustainabilityIEEE Communications Surveys and Tutorials (COMST), 2023

354

27 Oct 2023

A Survey on Model Compression for Large Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2023

Jian Li

475

417

15 Aug 2023