Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency

11 September 2024

Papers citing "Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency"

3 / 3 papers shown

Title
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study Xiaoyu Tian Sitong Zhao Haotian Wang Shuaiting Chen Yiping Peng Yunjie Ji Han Zhao Xiangang Li OffRL LRM 27 0 0 04 May 2025
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training Xiaoyu Tian Sitong Zhao Haotian Wang Shuaiting Chen Yiping Peng Yunjie Ji Han Zhao Xiangang Li LRM 57 1 0 24 Apr 2025
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Jiale Cheng Xiao-Chang Liu C. Wang Xiaotao Gu Y. Lu Dan Zhang Yuxiao Dong J. Tang Hongning Wang Minlie Huang LRM 123 3 0 16 Dec 2024