32
0

A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition

Main:3 Pages
3 Figures
Bibliography:3 Pages
3 Tables
Appendix:1 Pages
Abstract

System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.

View on arXiv
@article{srivastava2025_2506.01147,
  title={ A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition },
  author={ Prerak Srivastava and Giulio Corallo and Sergey Rybalko },
  journal={arXiv preprint arXiv:2506.01147},
  year={ 2025 }
}
Comments on this paper