450

TimeBERT: Extending Pre-Trained Language Representations with Temporal Information

Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
Abstract

Time is an important aspect of documents and is used in a range of NLP and IR tasks. In this work, we investigate methods for incorporating temporal information during pre-training to further improve the performance on time-related tasks. Compared with BERT which utilizes synchronic document collections (BooksCorpus and English Wikipedia) as the training corpora, we use long-span temporal news article collection for building word representations. We introduce TimeBERT, a novel language representation model trained on a temporal collection of news articles via two new pre-training tasks, which harness two distinct temporal signals to construct time-aware language representations. The experimental results show that TimeBERT consistently outperforms BERT and other existing pre-trained models, with substantial gains on different downstream NLP tasks or applications for which time is of high importance.

View on arXiv
Comments on this paper