v1v2 (latest)

BinaryBERT: Pushing the Limit of BERT Quantization

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

31 December 2020

Lifeng Shang

Xin Jiang

Qun Liu

Michael Lyu

Irwin King

ArXiv (abs)PDF HTML

Papers citing "BinaryBERT: Pushing the Limit of BERT Quantization"

50 / 152 papers shown

SingleQuant: Efficient Quantization of Large Language Models in a Single Pass

149

27 Nov 2025

T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization

201

17 Nov 2025

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya J. Bajpai

M. Hanawal

MLLM VLM

258

26 Oct 2025

Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning

181

10 Oct 2025

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

318

30 Sep 2025

EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models

Omar Bazarbachi

Zijun Sun

Yanning Shen

134

13 Aug 2025

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

240

28 Jul 2025

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

Ba-Hien Tran

Van Minh Nguyen

543

28 May 2025

ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning

428

28 May 2025

HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language ModelsThe VLDB journal (VLDB J.), 2025

237

24 Apr 2025

COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference

562

22 Apr 2025

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

473

07 Apr 2025

PARQ: Piecewise-Affine Regularized Quantization

337

19 Mar 2025

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

312

10 Mar 2025

MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration

240

07 Mar 2025

Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability

Ashhadul Islam

S. Belhaouari

Amine Bermak

269

24 Feb 2025

SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient ScalingsDesign, Automation and Test in Europe (DATE), 2023

211

24 Feb 2025

GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

614

18 Feb 2025

LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits

1.3K

12 Feb 2025

BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as ExpertsInternational Conference on Learning Representations (ICLR), 2025

Divya J. Bajpai

M. Hanawal

539

02 Feb 2025

HadamRNN: Binary and Sparse Ternary Orthogonal RNNsInternational Conference on Learning Representations (ICLR), 2025

Armand Foucault

Franck Mamalet

François Malgouyres

1.1K

28 Jan 2025

QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

253

16 Dec 2024

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

430

26 Nov 2024

Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference

...

320

04 Nov 2024

FlatQuant: Flatness Matters for LLM Quantization

...

697

12 Oct 2024

Preserving Empirical Probabilities in BERT for Small-sample Clinical Entity Recognition

Abdul Rehman

Jiangning Zhang

Xiaosong Yang

311

05 Sep 2024

Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang

Zhanglu Yan

Weng-Fai Wong

217

04 Sep 2024

1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

Liping Jing

277

26 Aug 2024

MoDeGPT: Modular Decomposition for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024

916

19 Aug 2024

Accelerating Large Language Model Inference with Self-Supervised Early Exits

Florian Valade

LRM

253

30 Jul 2024

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Shangyu Wu

Yufei Cui

...

Xue Liu

592

120

18 Jul 2024

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

338

16 Jul 2024

Croppable Knowledge Graph Embedding

359

03 Jul 2024

OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

Qi Qi

Jianxin Liao

221

27 Jun 2024

A Complete Survey on LLM-based AI Chatbots

Sumit Kumar Dam

Choong Seon Hong

Yu Qiao

Chaoning Zhang

332

151

17 Jun 2024

AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers

Emil Biju

Anirudh Sriram

Mert Pilanci

313

13 Jun 2024

VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning

Oshin Dutta

Ritvik Gupta

Sumeet Agarwal

377

07 Jun 2024

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

277

05 Jun 2024

Scalable MatMul-free Language Modeling

635

04 Jun 2024

FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

Qianli Shen

271

28 May 2024

BOLD: Boolean Logic Deep Learning

438

25 May 2024

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

352

12 Apr 2024

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

Haocheng Xi

Yuxiang Chen

Kang Zhao

Kaijun Zheng

Jianfei Chen

Jun Zhu

266

19 Mar 2024

FBPT: A Fully Binary Point TransformerIEEE International Conference on Robotics and Automation (ICRA), 2024

Zhixing Hou

Yuzhang Shang

Yan Yan

308

15 Mar 2024

C^3

: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding

Jing Gao

191

25 Feb 2024

Head-wise Shareable Attention for Large Language Models

Zouying Cao

Yifei Yang

Hai Zhao

227

19 Feb 2024

Model Compression and Efficient Inference for Large Language Models: A Survey

379

15 Feb 2024

A Survey on Transformer Compression

584

05 Feb 2024

A Comprehensive Survey of Compression Algorithms for Language Models

392

27 Jan 2024

BETA: Binarized Energy-Efficient Transformer Accelerator at the EdgeInternational Symposium on Circuits and Systems (ISCAS), 2024

Yuhao Ji

Chao Fang

Zhongfeng Wang

294

22 Jan 2024