Title |
---|
![]() Do LLMs "know" internally when they follow instructions? Juyeon Heo Christina Heinze-Deml Oussama Elachqar Shirley Ren Udhay Nallasamy Andy Miller Kwan Ho Ryan Chan Jaya Narain |
![]() QPO: Query-dependent Prompt Optimization via Multi-Loop Offline
Reinforcement Learning Yilun Kong Hangyu Mao Qi Zhao Bin Zhang Jingqing Ruan Li Shen Yongzhe Chang Xueqian Wang Rui Zhao Dacheng Tao |