Dion: A Communication-Efficient Optimizer for Large Models

7 April 2025

Abstract

Training large AI models efficiently requires distributing computation across multiple accelerators, but this often incurs significant communication overhead -- especially during gradient synchronization. We introduce Dion, a communication-efficient optimizer that retains the synchronous semantics of standard distributed training (e.g., DDP, FSDP) while substantially reducing I/O costs. Unlike conventional optimizers that synchronize full gradient matrices, Dion leverages orthonormalized updates with device-local momentum buffers, eliminating the need for full gradient exchange. It further supports an efficient sharding strategy that avoids reconstructing large matrices during training.

View on arXiv

@article{ahn2025_2504.05295,
  title={ Dion: A Communication-Efficient Optimizer for Large Models },
  author={ Kwangjun Ahn and Byron Xu },
  journal={arXiv preprint arXiv:2504.05295},
  year={ 2025 }
}

Comments on this paper