Network error. Please check your internet connection.
Network error. Please check your internet connection.
Network error. Please check your internet connection.
Network error. Please check your internet connection.
41

Extraction and Evaluation of Formulaic Expressions Used in Scholarly Papers

Abstract

Formulaic expressions, such as ín this paper we propose', are helpful for authors of scholarly papers because they convey communicative functions; in the above, it is showing the aim of this paper'. Thus, resources of formulaic expressions, such as a dictionary, that could be looked up easily would be useful. However, forms of formulaic expressions can often vary to a great extent. For example, ín this paper we propose', ín this study we propose' and ín this paper we propose a new method to' are all regarded as formulaic expressions. Such a diversity of spans and forms causes problems in both extraction and evaluation of formulaic expressions. In this paper, we propose a new approach that is robust to variation of spans and forms of formulaic expressions. Our approach regards a sentence as consisting of a formulaic part and non-formulaic part. Then, instead of trying to extract formulaic expressions from a whole corpus, by extracting them from each sentence, different forms can be dealt with at once. Based on this formulation, to avoid the diversity problem, we propose evaluating extraction methods by how much they convey specific communicative functions rather than by comparing extracted expressions to an existing lexicon. We also propose a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence. Experimental results show that the proposed extraction method achieved the best performance compared to other existing methods.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.