ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15701
  4. Cited By
BinaryBERT: Pushing the Limit of BERT Quantization
v1v2 (latest)

BinaryBERT: Pushing the Limit of BERT Quantization

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
31 December 2020
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
    MQ
ArXiv (abs)PDFHTML

Papers citing "BinaryBERT: Pushing the Limit of BERT Quantization"

50 / 152 papers shown
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
Jinying Xiao
Bin Ji
Shasha Li
Xiaodong Liu
Ma Jun
Ye Zhong
Wei Li
Xuan Xie
Qingbo Wu
Jie Yu
MQ
149
0
0
27 Nov 2025
T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization
T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization
Hyunwoo Oh
KyungIn Nam
Rajat Bhattacharjya
Hanning Chen
Tamoghno Das
Sanggeon Yun
Suyeon Jang
Andrew Ding
Nikil Dutt
Mohsen Imani
201
3
0
17 Nov 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLMVLM
258
0
0
26 Oct 2025
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Minsik Choi
Hyegang Son
Changhoon Kim
Young Geun Kim
AAML
181
0
0
10 Oct 2025
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Xin Yu
Cong Xie
Ziyu Zhao
Tiantian Fan
Lingzhou Xue
Zhi-Li Zhang
318
0
0
30 Sep 2025
EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models
Omar Bazarbachi
Zijun Sun
Yanning Shen
134
0
0
13 Aug 2025
Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study
Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study
Yiran Huang
Lukas Thede
Goran Frehse
Wenjia Xu
Zeynep Akata
240
0
0
28 Jul 2025
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Ba-Hien Tran
Van Minh Nguyen
MQ
543
2
0
28 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
428
3
0
28 May 2025
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language ModelsThe VLDB journal (VLDB J.), 2025
Junxuan Zhang
Jiadong Wang
Haoyang Li
Lidan Shou
Ke Chen
Gang Chen
Qin Xie
Guiming Xie
Xuejian Gong
237
1
0
24 Apr 2025
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
Ye Qiao
Zhiheng Cheng
Yian Wang
Yifan Zhang
Yunzhe Deng
Sitao Huang
562
3
0
22 Apr 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQLRM
473
43
0
07 Apr 2025
PARQ: Piecewise-Affine Regularized Quantization
PARQ: Piecewise-Affine Regularized Quantization
Lisa Jin
Jianhao Ma
Zechun Liu
Andrey Gromov
Aaron Defazio
Lin Xiao
MQ
337
5
0
19 Mar 2025
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
Xun Liang
Ding Chen
Zhen Tao
Pengnian Qi
Chenyang Xi
Hanyu Wang
Jihao Zhao
Feiyu Xiong
Shichao Song
Zhiyu Li
VLM
312
1
0
10 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
Jiangming Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQMoMe
240
3
0
07 Mar 2025
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability
Ashhadul Islam
S. Belhaouari
Amine Bermak
269
0
0
24 Feb 2025
SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient Scalings
SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient ScalingsDesign, Automation and Test in Europe (DATE), 2023
Renjie Wei
Shuwen Zhang
Zechun Liu
Meng Li
R. Huang
Runsheng Wang
MQSupR
211
3
0
24 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
MQALM
614
20
0
18 Feb 2025
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
Zikai Zhou
Qizheng Zhang
Hermann Kumbong
Kunle Olukotun
MQ
1.3K
6
0
12 Feb 2025
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as ExpertsInternational Conference on Learning Representations (ICLR), 2025
Divya J. Bajpai
M. Hanawal
539
5
0
02 Feb 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
HadamRNN: Binary and Sparse Ternary Orthogonal RNNsInternational Conference on Learning Representations (ICLR), 2025
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
1.1K
1
0
28 Jan 2025
QPruner: Probabilistic Decision Quantization for Structured Pruning in
  Large Language Models
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Changhai Zhou
Yuhua Zhou
Shijie Han
Qian Qiao
Hongguang Li
MQ
253
0
0
16 Dec 2024
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
  Quantized LLMs with 100T Training Tokens
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
Xu Ouyang
Tao Ge
Thomas Hartvigsen
Zhisong Zhang
Haitao Mi
Dong Yu
MQ
430
13
0
26 Nov 2024
Shrinking the Giant : Quasi-Weightless Transformers for Low Energy
  Inference
Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference
Shashank Nag
Alan T. L. Bacellar
Zachary Susskind
Anshul Jha
Logan Liberty
...
Krishnan Kailas
P. Lima
Neeraja J. Yadwadkar
F. M. G. França
L. John
320
0
0
04 Nov 2024
FlatQuant: Flatness Matters for LLM Quantization
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wen Liu
Jun Yao
MQ
697
56
0
12 Oct 2024
Preserving Empirical Probabilities in BERT for Small-sample Clinical
  Entity Recognition
Preserving Empirical Probabilities in BERT for Small-sample Clinical Entity Recognition
Abdul Rehman
Jiangning Zhang
Xiaosong Yang
311
0
0
05 Sep 2024
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
Kaiwen Tang
Zhanglu Yan
Weng-Fai Wong
217
8
0
04 Sep 2024
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Chang Gao
Jianfei Chen
Kang Zhao
Jiaqi Wang
Liping Jing
MQ
277
3
0
26 Aug 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
916
35
0
19 Aug 2024
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Florian Valade
LRM
253
4
0
30 Jul 2024
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Shangyu Wu
Ying Xiong
Yufei Cui
Haolun Wu
Can Chen
...
Lianming Huang
Xue Liu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
3DVRALM
592
120
0
18 Jul 2024
Co-Designing Binarized Transformer and Hardware Accelerator for
  Efficient End-to-End Edge Deployment
Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment
Yuhao Ji
Chao Fang
Shaobo Ma
Haikuo Shao
Zhongfeng Wang
MQ
338
4
0
16 Jul 2024
Croppable Knowledge Graph Embedding
Croppable Knowledge Graph Embedding
Yushan Zhu
Wen Zhang
Zhiqiang Liu
Yin Hua
Lei Liang
H. Chen
359
0
0
03 Jul 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
221
4
0
27 Jun 2024
A Complete Survey on LLM-based AI Chatbots
A Complete Survey on LLM-based AI Chatbots
Sumit Kumar Dam
Choong Seon Hong
Yu Qiao
Chaoning Zhang
332
151
0
17 Jun 2024
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
Emil Biju
Anirudh Sriram
Mert Pilanci
313
0
0
13 Jun 2024
VTrans: Accelerating Transformer Compression with Variational
  Information Bottleneck based Pruning
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
377
4
0
07 Jun 2024
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic
  Bi-Spiking Mechanisms
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
Xingrun Xing
Zheng Zhang
Ziyi Ni
Shitao Xiao
Yiming Ju
Siqi Fan
Yequan Wang
Jiajun Zhang
Guoqi Li
277
30
0
05 Jun 2024
Scalable MatMul-free Language Modeling
Scalable MatMul-free Language Modeling
Rui-Jie Zhu
Yu Zhang
Ethan Sifferman
Tyler Sheaves
Yiqiao Wang
Dustin Richmond
P. Zhou
Nhan Duy Truong
635
37
0
04 Jun 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language
  Models
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
271
22
0
28 May 2024
BOLD: Boolean Logic Deep Learning
BOLD: Boolean Logic Deep Learning
Van Minh Nguyen
Cristian Ocampo
Aymen Askri
Louis Leconte
Ba-Hien Tran
AI4CE
438
3
0
25 May 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
352
38
0
12 Apr 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data
  Flow and Per-Block Quantization
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
266
34
0
19 Mar 2024
FBPT: A Fully Binary Point Transformer
FBPT: A Fully Binary Point TransformerIEEE International Conference on Robotics and Automation (ICRA), 2024
Zhixing Hou
Yuzhang Shang
Yan Yan
MQ
308
1
0
15 Mar 2024
$C^3$: Confidence Calibration Model Cascade for Inference-Efficient
  Cross-Lingual Natural Language Understanding
C3C^3C3: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding
Taixi Lu
Haoyu Wang
Huajie Shao
Jing Gao
Huaxiu Yao
191
0
0
25 Feb 2024
Head-wise Shareable Attention for Large Language Models
Head-wise Shareable Attention for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
227
6
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
379
95
0
15 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
584
73
0
05 Feb 2024
A Comprehensive Survey of Compression Algorithms for Language Models
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
392
22
0
27 Jan 2024
BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge
BETA: Binarized Energy-Efficient Transformer Accelerator at the EdgeInternational Symposium on Circuits and Systems (ISCAS), 2024
Yuhao Ji
Chao Fang
Zhongfeng Wang
294
8
0
22 Jan 2024
1234
Next
Page 1 of 4