ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing
Nguyen Ha Thanh
Vu Tran
Phuong Minh Nguyen
Thi-Hai-Yen Vuong
Quan Minh Bui
Chau Nguyen
Binh Dang
Minh Le Nguyen
Kenji Satoh

Abstract
Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original sentence to produce the correct translation sentence. From that observation, in this paper, we propose ParaLaw Nets, a pretrained model family using sentence-level cross-lingual information to reduce ambiguity and increase the performance in legal text processing. This approach achieved the best result in the Question Answering task of COLIEE-2021.
View on arXivComments on this paper