MATATA: a weak-supervised MAthematical Tool-Assisted reasoning for Tabular Applications

IEEE International Conference on Document Analysis and Recognition (ICDAR), 2024

28 November 2024

ArXiv (abs)PDF HTML HuggingFace (8 upvotes)Github

Main:14 Pages

2 Figures

Bibliography:4 Pages

7 Tables

Appendix:22 Pages

Abstract

Mathematical reasoning capabilities are increasing with tool-augmented language agents, but methods often rely either on closed-source or large models, external data, or extensive prompt engineering. This work introduces MATATA, a novel cost-effective method to train LLM agents for tabular data problems through reasoning, planning, and tool use. With a progressive self-improvement paradigm and an iterative weak supervision, it empowers 3.8B/8B Small Language Models (SLMs), particularly suited for local hosting and sensitive business contexts where data privacy is crucial. By employing a flexible and reusable tools across different datasets, it achieves robust performance with effective scalability across shared tasks. Experiments show that MATATA reaches state-of-the-art performances on FinQA and TAT-QA among reasoning frameworks based on open-source models. Moreover, MATATA models compete with GPT-4 based frameworks on TabMWP, while being SLMs.

View on arXiv

Comments on this paper