Comparing Task-Agnostic Embedding Models for Tabular Data
- LMTD
Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation learning and task-specific inference inside a single, resource-intensive network. This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings. We systematically evaluate task-agnostic representations extracted from tabular foundation models (TabPFN, TabICL and TabSTAR) alongside classical feature engineering (TableVectorizer and a sphere model) across a variety of application tasks as outlier detection (ADBench) and supervised learning (TabArena Lite). We find that simple feature engineering methods achieve comparable or superior performance while requiring significantly less computational resources than tabular foundation models.
View on arXiv