Syntactically Informed Text Compression with Recurrent Neural Networks

8 August 2016

Abstract

We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing -- Google's SyntaxNet -- to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information. Modeling and coding are the backbone of modern compression schemes. While coding is considered a solved problem, generating effective, domain-specific models remains a critical step in the process of improving compression ratios.

View on arXiv

Comments on this paper