Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position \emph{Large Agent Models (LAMs)} that internalize the generation of \emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available atthis https URL
View on arXiv@article{zhang2025_2503.06580, title={ Agent models: Internalizing Chain-of-Action Generation into Reasoning models }, author={ Yuxiang Zhang and Yuqi Yang and Jiangming Shu and Xinyan Wen and Jitao Sang }, journal={arXiv preprint arXiv:2503.06580}, year={ 2025 } }