77
0

Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting

Abstract

Embeddings serve as condensed vector representations for real-world entities, finding applications in Natural Language Processing (NLP), Computer Vision, and Data Management across diverse downstream tasks. Here, we introduce novel specialized embeddings optimized, and explicitly tailored to encode the intricacies of complex 2-D context in tables, featuring horizontal, vertical hierarchical metadata, and nesting. To accomplish that we define the Bi-dimensional tabular coordinates, separate horizontal, vertical metadata and data contexts by introducing a new visibility matrix, encode units and nesting through the embeddings specifically optimized for mimicking intricacies of such complex structured data. Through evaluation on 5 large-scale structured datasets and 3 popular downstream tasks, we observed that our solution outperforms the state-of-the-art models with the significant MAP delta of up to 0.28. GPT-4 LLM+RAG slightly outperforms us with MRR delta of up to 0.1, while we outperform it with the MAP delta of up to 0.42.

View on arXiv
@article{shrestha2025_2502.15819,
  title={ Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting },
  author={ Gyanendra Shrestha and Chutain Jiang and Sai Akula and Vivek Yannam and Anna Pyayt and Michael Gubanov },
  journal={arXiv preprint arXiv:2502.15819},
  year={ 2025 }
}
Comments on this paper