Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17764
Cited By
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
27 February 2024
Shuming Ma
Hongyu Wang
Lingxiao Ma
Lei Wang
Wenhui Wang
Shaohan Huang
Lifeng Dong
Ruiping Wang
Jilong Xue
Furu Wei
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"
50 / 137 papers shown
Title
Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance
Jiayi Chen
Chen Wu
S. Zhang
Nan Li
L. Zhang
Qi Zhang
69
0
0
23 Nov 2024
Bi-Mamba: Towards Accurate 1-Bit State Space Models
Shengkun Tang
Liqun Ma
H. Li
Mingjie Sun
Zhiqiang Shen
Mamba
73
3
0
18 Nov 2024
An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
Mohsen Dehghankar
Mahdi Erfanian
Abolfazl Asudeh
35
0
0
10 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy
Razvan-Gabriel Dumitru
Paul-Ioan Clotan
Vikas Yadav
Darius Peteleaza
Mihai Surdeanu
22
4
0
05 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
34
4
0
05 Nov 2024
Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference
Shashank Nag
Alan T. L. Bacellar
Zachary Susskind
Anshul Jha
Logan Liberty
...
Krishnan Kailas
P. Lima
Neeraja J. Yadwadkar
F. M. G. França
L. John
33
0
0
04 Nov 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
26
1
0
24 Oct 2024
Pyramid Vector Quantization for LLMs
Tycho F. A. van der Ouderaa
Maximilian L. Croci
Agrin Hilmkil
James Hensman
MQ
29
0
0
22 Oct 2024
Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
Zifei Xu
Sayeh Sharify
W. Yazar
T. Webb
Xin Eric Wang
MQ
33
0
0
18 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
21
1
0
17 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
50
0
0
14 Oct 2024
Gradient-Free Neural Network Training on the Edge
Dotan Di Castro
O. Joglekar
Shir Kozlovsky
Vladimir Tchuiev
Michal Moshkovitz
MQ
14
0
0
13 Oct 2024
On-Chip Learning via Transformer In-Context Learning
Jan Finkbeiner
Emre Neftci
21
0
0
11 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
44
4
0
09 Oct 2024
Accelerating Error Correction Code Transformers
Matan Levy
Yoni Choukroun
Lior Wolf
MQ
21
0
0
08 Oct 2024
ESPACE: Dimensionality Reduction of Activations for Model Compression
Charbel Sakr
Brucek Khailany
20
3
0
07 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
73
1
0
02 Oct 2024
Getting Free Bits Back from Rotational Symmetries in LLMs
Jiajun He
Gergely Flamich
José Miguel Hernández-Lobato
MQ
13
0
0
02 Oct 2024
Dynamic neurons: A statistical physics approach for analyzing deep neural networks
Donghee Lee
Hye-Sung Lee
Jaeok Yi
16
1
0
01 Oct 2024
Scrambled text: training Language Models to correct OCR errors using synthetic data
Jonathan Bourne
SyDa
34
2
0
29 Sep 2024
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
22
3
0
25 Sep 2024
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
Yifei Liu
Jicheng Wen
Yang Wang
Shengyu Ye
Li Lyna Zhang
Ting Cao
Cheng Li
Mao Yang
MQ
47
9
0
25 Sep 2024
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
32
27
0
26 Aug 2024
LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
Jainaveen Sundaram
Ravi Iyer
MLLM
26
2
0
23 Aug 2024
Matmul or No Matmal in the Era of 1-bit LLMs
Jinendra Malekar
Mohammed E. Elbtity
Ramtin Zand
MQ
24
2
0
21 Aug 2024
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
Chao Zeng
Songwei Liu
Yusheng Xie
Hong Liu
Xiaojian Wang
Miao Wei
Shu Yang
Fangmin Chen
Xing Mei
MQ
37
5
0
16 Aug 2024
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
Zhiwen Mo
Lei Wang
Jianyu Wei
Zhichen Zeng
Shijie Cao
...
Naifeng Jing
Ting Cao
Jilong Xue
Fan Yang
Mao Yang
54
4
0
12 Aug 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
38
9
0
10 Aug 2024
Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers
Marcus Buckmann
Edward Hill
29
2
0
06 Aug 2024
Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages
Tomás Filip
Martin Pavlícek
Petr Sosík
17
2
0
04 Aug 2024
A General-Purpose Device for Interaction with LLMs
Jiajun Xu
Qun Wang
Yuhang Cao
Baitao Zeng
Sicheng Liu
16
4
0
02 Aug 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
69
8
0
29 Jul 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
51
9
0
24 Jul 2024
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Hongyu Wang
Shuming Ma
Ruiping Wang
Furu Wei
MoE
33
11
0
15 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
37
3
0
12 Jul 2024
On Exact Bit-level Reversible Transformers Without Changing Architectures
Guoqiang Zhang
J. P. Lewis
W. Kleijn
MQ
AI4CE
25
0
0
12 Jul 2024
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
26
1
0
12 Jul 2024
Learning Program Behavioral Models from Synthesized Input-Output Pairs
Tural Mammadov
Dietrich Klakow
Alexander Koller
Andreas Zeller
34
3
0
11 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
MQ
36
22
0
10 Jul 2024
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Liqun Ma
Mingjie Sun
Zhiqiang Shen
29
6
0
09 Jul 2024
Exploring Advanced Large Language Models with LLMsuite
Giorgio Roffo
LLMAG
17
0
0
01 Jul 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
40
4
0
01 Jul 2024
ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
Zhengqing Yuan
Rong-Er Zhou
Hongyi Wang
Lifang He
Yanfang Ye
Lichao Sun
MQ
20
8
0
26 Jun 2024
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
Jacob Nielsen
Peter Schneider-Kamp
MQ
35
4
0
24 Jun 2024
MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication
Shubhabrata Mukherjee
Cory Beard
Sejun Song
25
0
0
22 Jun 2024
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
Yuqi Nie
Yaxuan Kong
Xiaowen Dong
John M. Mulvey
H. Vincent Poor
Qingsong Wen
Stefan Zohren
AIFin
40
41
0
15 Jun 2024
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos
Steven Abreu
Tiffany D. Do
Karan Ahuja
Eric J. Gonzalez
Lee Payne
Daniel J. McDuff
Mar González-Franco
32
2
0
14 Jun 2024
Q-S5: Towards Quantized State Space Models
Steven Abreu
Jens Egholm Pedersen
Kade Heckel
Alessandro Pierro
MQ
24
7
0
13 Jun 2024
Dynamical Mean-Field Theory of Self-Attention Neural Networks
Ángel Poc-López
Miguel Aguilera
AI4CE
30
0
0
11 Jun 2024
Previous
1
2
3
Next