Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models

30 April 2025

Papers citing "Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models"

1 / 1 papers shown

Title
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs Lucas Maisonnave Cyril Moineau Olivier Bichler Fabrice Rastello MQ 18 0 0 18 Apr 2025