200

JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models

Ce Chi
Xing Wang
Zhendong Wang
Xiaofan Liu
Ce Li
Zhiyan Song
Chen Zhao
Kexin Yang
Boshen Shi
Jingjing Yang
Chao Deng
Junlan Feng
Main:23 Pages
9 Figures
Bibliography:4 Pages
17 Tables
Appendix:1 Pages
Abstract

In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model. Afterwards, a four-stage table reasoning workflow is proposed, including table preprocessing, table sensing, tool-integrated reasoning, and prompt engineering, to improve model interpretability and execution accuracy. Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks, demonstrating the effectiveness of data-centric generation and workflow-driven optimization.

View on arXiv
Comments on this paper