v1v2v3 (latest)

GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning

Annual Meeting of the Association for Computational Linguistics (ACL), 2025

18 February 2025

ArXiv (abs)PDF HTML Github (30258★)

Papers citing "GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning"

50 / 64 papers shown

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

349

19 Nov 2025

OTARo: Once Tuning for All Precisions toward Robust On-Device LLMs

198

17 Nov 2025

Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks

516

05 Nov 2025

Robust RGB-T Tracking via Learnable Visual Fourier Prompt Fine-tuning and Modality Fusion Prompt Generation

259

24 Sep 2025

Mano Technical Report

...

318

22 Sep 2025

Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports

273

19 Aug 2025

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

Yan Gao

Massimo Roberto Scamarcia

Javier Fernandez-Marques

...

513

03 Jun 2025

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance

409

02 May 2025

SpinQuant: LLM quantization with learned rotationsInternational Conference on Learning Representations (ICLR), 2024

Raghuraman Krishnamoorthi

Vikas Chandra

Yuandong Tian

Tijmen Blankevoort

706

309

21 Feb 2025

OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution FittingInternational Conference on Learning Representations (ICLR), 2025

350

23 Jan 2025

A GEN AI Framework for Medical Note Generation

390

27 Sep 2024

Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation

477

14 Sep 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Saleh Ashkboos

Amirkeivan Mohtashami

Dan Alistarh

615

417

30 Mar 2024

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

964

861

21 Mar 2024

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

938

1,452

20 Mar 2024

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

Haocheng Xi

Yuxiang Chen

Kang Zhao

Kaijun Zheng

Jianfei Chen

Jun Zhu

266

19 Mar 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Yuandong Tian

556

416

06 Mar 2024

LoRA+: Efficient Low Rank Adaptation of Large Models

557

373

19 Feb 2024

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

268

19 Feb 2024

DoRA: Weight-Decomposed Low-Rank Adaptation

906

777

14 Feb 2024

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

623

213

12 Oct 2023

Improved Baselines with Visual Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023

746

4,820

05 Oct 2023

Instruction Tuning for Large Language Models: A Survey

...

Jiwei Li

1.2K

824

21 Aug 2023

Training Transformers with 4-bit IntegersNeural Information Processing Systems (NeurIPS), 2023

Haocheng Xi

Changhao Li

Jianfei Chen

Jun Zhu

408

21 Jun 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot ArenaNeural Information Processing Systems (NeurIPS), 2023

...

3.4K

7,883

09 Jun 2023

AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationConference on Machine Learning and Systems (MLSys), 2023

Chuang Gan

Song Han

EDL MQ

1.0K

1,217

01 Jun 2023

QLoRA: Efficient Finetuning of Quantized LLMsNeural Information Processing Systems (NeurIPS), 2023

Tim Dettmers

Artidoro Pagnoni

Ari Holtzman

Luke Zettlemoyer

ALM

784

4,250

23 May 2023

FP8 versus INT8 for efficient deep learning inference

...

318

31 Mar 2023

Multitask Prompt Tuning Enables Parameter-Efficient Transfer LearningInternational Conference on Learning Representations (ICLR), 2023

Huan Sun

296

159

06 Mar 2023

LLaMA: Open and Efficient Foundation Language Models

...

20.1K

19,316

27 Feb 2023

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar

Saleh Ashkboos

Torsten Hoefler

Dan Alistarh

681

1,837

31 Oct 2022

SQuAT: Sharpness- and Quantization-Aware Training for BERT

279

13 Oct 2022

MKQ-BERT: Quantized BERT with 4-bits Weights and Activations

186

25 Mar 2022

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic RoundingInternational Symposium on High-Performance Computer Architecture (HPCA), 2021

Shanghang Zhang

Bradley McDanel

H. T. Kung

211

28 Oct 2021

8-bit Optimizers via Block-wise Quantization

Tim Dettmers

M. Lewis

Sam Shleifer

Luke Zettlemoyer

563

440

06 Oct 2021

LoRA: Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2021

OffRL AI4TS AI4CE ALM AIMat

1.8K

17,979

17 Jun 2021

Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021

...

2.2K

46,392

26 Feb 2021

I-BERT: Integer-only BERT QuantizationInternational Conference on Machine Learning (ICML), 2021

Sehoon Kim

536

370

05 Jan 2021

BinaryBERT: Pushing the Limit of BERT QuantizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Lifeng Shang

Xin Jiang

Qun Liu

Michael Lyu

Irwin King

655

261

31 Dec 2020

Sharpness-Aware Minimization for Efficiently Improving GeneralizationInternational Conference on Learning Representations (ICLR), 2020

982

1,815

03 Oct 2020

TernaryBERT: Distillation-aware Ultra-low Bit BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Lifeng Shang

Xin Jiang

Qun Liu

415

230

27 Sep 2020

Towards Unified INT8 Training for Convolutional Neural NetworkComputer Vision and Pattern Recognition (CVPR), 2019

Xianglong Liu

327

176

29 Dec 2019

PIQA: Reasoning about Physical Commonsense in Natural LanguageAAAI Conference on Artificial Intelligence (AAAI), 2019

Yejin Choi

3.2K

2,818

26 Nov 2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural NetworksNeural Information Processing Systems (NeurIPS), 2019

Zhen Dong

303

361

10 Nov 2019

Q8BERT: Quantized 8Bit BERT

577

567

14 Oct 2019

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit IntegersNeural Networks (NN), 2019

Lei Deng

349

125

05 Sep 2019

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural NetworksIEEE International Conference on Computer Vision (ICCV), 2019

Xianglong Liu

341

536

14 Aug 2019

Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge

H. F. Langroudi

Zachariah Carmichael

David Pastuch

Dhireesha Kudithipudi

259

06 Aug 2019

Deep Learning Training on the Edge with Low-Precision Posits

H. F. Langroudi

Zachariah Carmichael

Dhireesha Kudithipudi

210

30 Jul 2019

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No QuestionsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019

845

2,269

24 May 2019