Punctuation Prediction for Polish Texts using Transformers

Conference on Computer Science and Information Systems (FedCSIS), 2023

6 October 2024

Jakub Pokrywka

Main:2 Pages

Bibliography:2 Pages

Abstract

Speech recognition systems typically output text lacking punctuation. However, punctuation is crucial for written text comprehension. To tackle this problem, Punctuation Prediction models are developed. This paper describes a solution for Poleval 2022 Task 1: Punctuation Prediction for Polish Texts, which scores 71.44 Weighted F1. The method utilizes a single HerBERT model finetuned to the competition data and an external dataset.

View on arXiv

Comments on this paper