LLM Embeddings for Deep Learning on Tabular Data

17 February 2025

Abstract

Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.

View on arXiv

@article{koloski2025_2502.11596,
  title={ LLM Embeddings for Deep Learning on Tabular Data },
  author={ Boshko Koloski and Andrei Margeloiu and Xiangjian Jiang and Blaž Škrlj and Nikola Simidjievski and Mateja Jamnik },
  journal={arXiv preprint arXiv:2502.11596},
  year={ 2025 }
}

Comments on this paper