
Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. The early NMT model supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the model requires an external syntax to capture the deep syntactic awareness. Although recent syntax-aware NMT methods have bored great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, dependency-scaled self-attention network (Deps-SAN) for syntax-aware Transformer-based NMT. It integrates a quantified matrix of syntactic dependencies to impose explicit syntactic constraints into the SAN to learn syntactic details and dispel the dispersion of attention distributions. Two knowledge sparsing techniques are further proposed to avoid the model overfitting the dependency noises. Extensive experiments and analyses on the two benchmark NMT tasks verify the effectiveness of our approach.
View on arXiv