Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17764
Cited By
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
27 February 2024
Shuming Ma
Hongyu Wang
Lingxiao Ma
Lei Wang
Wenhui Wang
Shaohan Huang
Lifeng Dong
Ruiping Wang
Jilong Xue
Furu Wei
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"
37 / 137 papers shown
Title
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Yang Sui
Yanyu Li
Anil Kag
Yerlan Idelbayev
Junli Cao
Ju Hu
Dhritiman Sagar
Bo Yuan
Sergey Tulyakov
Jian Ren
MQ
39
17
0
06 Jun 2024
Training of Physical Neural Networks
Ali Momeni
Babak Rahmani
B. Scellier
Logan G. Wright
Peter L. McMahon
...
Julie Grollier
Andrea J. Liu
D. Psaltis
Andrea Alù
Romain Fleury
PINN
AI4CE
47
8
0
05 Jun 2024
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
Xingrun Xing
Zheng Zhang
Ziyi Ni
Shitao Xiao
Yiming Ju
Siqi Fan
Yequan Wang
Jiajun Zhang
Guoqi Li
37
7
0
05 Jun 2024
Llumnix: Dynamic Scheduling for Large Language Model Serving
Biao Sun
Ziming Huang
Hanyu Zhao
Wencong Xiao
Xinyi Zhang
Yong Li
Wei Lin
22
40
0
05 Jun 2024
USM RNN-T model weights binarization
Oleg Rybakov
Dmitriy Serdyuk
Chengjian Zheng
MQ
26
0
0
05 Jun 2024
Scalable MatMul-free Language Modeling
Rui-Jie Zhu
Yu Zhang
Ethan Sifferman
Tyler Sheaves
Yiqiao Wang
Dustin Richmond
P. Zhou
Jason Eshraghian
29
17
0
04 Jun 2024
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
Nicolas Zucchet
Antonio Orvieto
ODL
AAML
40
8
0
31 May 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
29
4
0
31 May 2024
xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems
Georg Rutishauser
Joan Mihali
Moritz Scherer
Luca Benini
19
1
0
29 May 2024
Compressing Large Language Models using Low Rank and Low Precision Decomposition
R. Saha
Naomi Sagan
Varun Srivastava
Andrea J. Goldsmith
Mert Pilanci
MQ
16
8
0
29 May 2024
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen
Alan Fan
Sarah M Pratt
Jae Sung Park
Matthew Wallingford
Sham Kakade
Ari Holtzman
Ranjay Krishna
Ali Farhadi
Aditya Kusupati
33
2
0
28 May 2024
Mechanistic Interpretability of Binary and Ternary Transformers
Jason Li
MQ
19
0
0
27 May 2024
LoQT: Low Rank Adapters for Quantized Training
Sebastian Loeschcke
M. Toftrup
M. Kastoryano
Serge J. Belongie
Vésteinn Snæbjarnarson
MQ
34
0
0
26 May 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
29
18
0
23 May 2024
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
33
2
0
23 May 2024
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
37
2
0
23 May 2024
Thermodynamic Natural Gradient Descent
Kaelan Donatella
Samuel Duffield
Maxwell Aifer
Denis Melanson
Gavin Crooks
Patrick J. Coles
26
3
0
22 May 2024
Requirements are All You Need: The Final Frontier for End-User Software Engineering
Diana Robinson
Christian Cabrera
Andrew D. Gordon
Neil D. Lawrence
Lars Mennen
29
4
0
22 May 2024
Interactive Simulations of Backdoors in Neural Networks
Peter Bajcsy
Maxime Bros
21
0
0
21 May 2024
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
33
12
0
17 May 2024
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
R. Sukthanker
Arber Zela
B. Staffler
Aaron Klein
Lennart Purucker
Jorg K. H. Franke
Frank Hutter
ELM
36
3
0
16 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
32
0
0
09 May 2024
Exploring Extreme Quantization in Spiking Language Models
Malyaban Bal
Yi Jiang
Abhronil Sengupta
MQ
45
4
0
04 May 2024
Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance
Osama Yousuf
Brian D. Hoskins
Karthick Ramu
Mitchell Fream
W. A. Borders
...
M. Daniels
A. Dienstfrey
Jabez J. McClelland
Martin Lueker-Boden
Gina Adam
18
1
0
24 Apr 2024
From a Lossless (~1.5:1) Compression Algorithm for Llama2 7B Weights to Variable Precision, Variable Range, Compressed Numeric Data Types for CNNs and LLMs
Vincenzo Liguori
MQ
18
1
0
16 Apr 2024
SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks
Sreyes P. Venkatesh
Razvan Marinescu
Jason Eshraghian
MQ
25
5
0
15 Apr 2024
ChatGPT and general-purpose AI count fruits in pictures surprisingly well
Konlavach Mengsuwan
Juan Camilo Rivera Palacio
Masahiro Ryo
VLM
16
0
0
12 Apr 2024
Blessing or curse? A survey on the Impact of Generative AI on Fake News
Alexander Loth
Martin Kappes
Marc-Oliver Pahl
22
4
0
03 Apr 2024
VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments
Bufang Yang
Lixing He
Kaiwei Liu
Zhenyu Yan
29
18
0
03 Apr 2024
Accurate Block Quantization in LLMs with Outliers
Nikita Trukhanov
I. Soloveychik
MQ
24
3
0
29 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
21
1
0
13 Mar 2024
Optimizing sDTW for AMD GPUs
Daniel Latta-Lin
Sofia Isadora Padilla Munoz
20
0
0
11 Mar 2024
Quantum linear algebra is all you need for Transformer architectures
Naixu Guo
Zhan Yu
Matthew Choi
Aman Agrawal
Kouhei Nakaji
Alán Aspuru-Guzik
P. Rebentrost
AI4CE
28
14
0
26 Feb 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
James Liu
Guangxuan Xiao
Kai Li
Jason D. Lee
Song Han
Tri Dao
Tianle Cai
31
20
0
15 Feb 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng
Jerry Chee
Qingyao Sun
Volodymyr Kuleshov
Christopher De Sa
MQ
126
92
0
06 Feb 2024
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Herbert Woisetschläger
Alexander Erben
Shiqiang Wang
R. Mayer
Hans-Arno Jacobsen
FedML
24
17
0
04 Oct 2023
A Comprehensive Survey on Enterprise Financial Risk Analysis from Big Data Perspective
Yu Zhao
Huaming Du
Qing Li
Fuzhen Zhuang
Ji Liu
Gang Kou
Gang Kou
30
1
0
28 Nov 2022
Previous
1
2
3