RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward

15 May 2025

Papers citing "RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward"

2 / 2 papers shown

Title
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Zihao Yi Qingxuan Jiang Ruotian Ma Xingyu Chen Qu Yang ... Fanghua Ye Ying Shen Zhaopeng Tu Xiaolong Li Linus 106 1 0 07 Nov 2025
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Zafir Stojanovski Oliver Stanley Joe Sharratt Richard Jones Abdulhakeem Adefioye Jean Kaddour Andreas Kopf OffRL LRM 294 36 0 30 May 2025