Gram2Vec: An Interpretable Document Vectorizer

17 June 2024

Peter Zeng

Hannah Stortz

Eric Sclafani

Alina Shabaeva

Maria Elizabeth Garza

Daniel Greeson

ArXiv (abs)PDF HTML Github (10★)

Main:8 Pages

1 Figures

Bibliography:2 Pages

7 Tables

Abstract

We present Gram2Vec, a grammatical style embedding system that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In this paper, we use authorship verification and AI detection as two applications to show how Gram2Vec can be used. For authorship verification, we use the features from Gram2Vec to explain why a pair of documents is by the same or by different authors. We also demonstrate how Gram2Vec features can be used to train a classifier for AI detection, outperforming machine learning models trained on a comparable set of Biber features.

View on arXiv

Comments on this paper