Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2507.08794
Cited By
v1
v2 (latest)
One Token to Fool LLM-as-a-Judge
11 July 2025
Yulai Zhao
Haolin Liu
Dian Yu
Sunyuan Kung
Meijia Chen
Haitao Mi
Dong Yu
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (29 upvotes)
Papers citing
"One Token to Fool LLM-as-a-Judge"
10 / 10 papers shown
Title
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLM
LRM
AI4CE
74
0
0
02 Oct 2025
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
Xin-Qiang Cai
Wei Wang
Feng Liu
Tongliang Liu
Gang Niu
Masashi Sugiyama
OffRL
AAML
4
0
0
01 Oct 2025
Who's Your Judge? On the Detectability of LLM-Generated Judgments
Dawei Li
Zhen Tan
Chengshuai Zhao
Bohan Jiang
Baixiang Huang
Pingchuan Ma
Abdullah Alnaibari
Kai Shu
Huan Liu
4
0
0
29 Sep 2025
SCI-Verifier: Scientific Verifier with Thinking
Shenghe Zheng
Chenyu Huang
F. Yu
Junchi Yao
Jingqi Ye
...
Yun Luo
Ning Ding
Lei Bai
Ganqu Cui
Peng Ye
LRM
12
0
0
29 Sep 2025
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards
Aaron Tu
Weihao Xuan
Heli Qi
X. Y. Huang
Qingcheng Zeng
...
Amin Saberi
Naoto Yokoya
Jure Leskovec
Yejin Choi
Fang Wu
OffRL
8
0
0
26 Sep 2025
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Yujun Zhou
Zhenwen Liang
Haolin Liu
Wenhao Yu
Kishan Panaganti
Linfeng Song
Dian Yu
Xiangliang Zhang
Haitao Mi
Dong Yu
12
6
0
18 Sep 2025
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Runpeng Dai
Linfeng Song
Haolin Liu
Zhenwen Liang
Dian Yu
...
Zhaopeng Tu
R. Liu
Tong Zheng
Hongtu Zhu
Dong Yu
LRM
24
3
0
11 Sep 2025
Better Language Model-Based Judging Reward Modeling through Scaling Comprehension Boundaries
Meiling Ning
Zhongbao Zhang
Junda Ye
Jiabao Guo
Qingyuan Guan
LRM
40
0
0
25 Aug 2025
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Chengsong Huang
Wenhao Yu
Xiaoyang Wang
H. Zhang
Zongxia Li
Ruosen Li
J. Huang
Haitao Mi
Dong Yu
ReLM
SyDa
LRM
70
14
0
07 Aug 2025
PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?
Lingfeng Zhou
Jialing Zhang
Jin Gao
Mohan Jiang
Dequan Wang
ELM
46
1
0
06 Aug 2025
1