Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.14261
Cited By
v1
v2
v3 (latest)
RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?
17 June 2025
Rohan Gupta
Erik Jenner
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?"
2 / 2 papers shown
Red-teaming Activation Probes using Prompted LLMs
Phil Blandfort
Robert Graham
AAML
LLMSV
399
0
0
01 Nov 2025
Probe-based Fine-tuning for Reducing Toxicity
Jan Wehner
Mario Fritz
AAML
97
0
0
24 Oct 2025
1
Page 1 of 1