v1v2 (latest)

Comparing Task-Agnostic Embedding Models for Tabular Data

18 November 2025

ArXiv (abs)PDF HTML Github (250★)

Main:7 Pages

14 Figures

Bibliography:1 Pages

3 Tables

Abstract

Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation learning and task-specific inference inside a single, resource-intensive network. This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings. We systematically evaluate task-agnostic representations extracted from tabular foundation models (TabPFN, TabICL and TabSTAR) alongside classical feature engineering (TableVectorizer and a sphere model) across a variety of application tasks as outlier detection (ADBench) and supervised learning (TabArena Lite). We find that simple feature engineering methods achieve comparable or superior performance while requiring significantly less computational resources than tabular foundation models.

View on arXiv

Comments on this paper