353
v1v2 (latest)

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Weixun Wang
XiaoXiao Xu
Wanhe An
Fangwen Dai
Wei Gao
Yancheng He
Ju Huang
Qiang Ji
Hanqi Jin
Xiaoyang Li
Yang Li
Zhongwen Li
Shirong Lin
Jiashun Liu
Zenan Liu
Tao Luo
Dilxat Muhtar
Yuanbin Qu
Jiaqiang Shi
Qinghui Sun
Yingshui Tan
Hao Tang
Runze Wang
Yi Wang
Zhaoguo Wang
Yanan Wu
Shaopan Xiong
Binchen Xu
Xander Xu
Yuchi Xu
Qipeng Zhang
Xixia Zhang
Haizhou Zhao
Jie Zhao
Shuaibing Zhao
Baihui Zheng
Jianhui Zheng
Suhang Zheng
Yanni Zhu
Mengze Cai
Kerui Cao
Xitong Chen
Yue Dai
Lifan Du
Tao Feng
Tao He
Jin Hu
Yijie Hu
Ziyu Jiang
Cheng Li
Xiang Li
Jing Liang
Xin Lin
Chonghuan Liu
ZhenDong Liu
Zhiqiang Lv
Haodong Mi
Yanhu Mo
Junjia Ni
Shixin Pei
Jingyu Shen
XiaoShuai Song
Cecilia Wang
Chaofan Wang
Kangyu Wang
Pei Wang
Tao Wang
Wei Wang
Ke Xiao
Mingyu Xu
Tiange Xu
Nan Ya
Siran Yang
Jianan Ye
Yaxing Zang
Duo Zhang
Junbo Zhang
Boren Zheng
Wanxi Deng
Ling Pan
Lin Qu
Wenbo Su
Jiamang Wang
Wei Wang
Hu Wei
Minggang Wu
Cheng Yu
Bing Zhao
Zhicheng Zheng
Bo Zheng
Main:37 Pages
20 Figures
Bibliography:4 Pages
8 Tables
Abstract

Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.

View on arXiv
Comments on this paper