ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14261
  4. Cited By
RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?
v1v2v3 (latest)

RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

17 June 2025
Rohan Gupta
Erik Jenner
ArXiv (abs)PDFHTMLGithub

Papers citing "RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?"

2 / 2 papers shown
Red-teaming Activation Probes using Prompted LLMs
Red-teaming Activation Probes using Prompted LLMs
Phil Blandfort
Robert Graham
AAMLLLMSV
399
0
0
01 Nov 2025
Probe-based Fine-tuning for Reducing Toxicity
Probe-based Fine-tuning for Reducing Toxicity
Jan Wehner
Mario Fritz
AAML
97
0
0
24 Oct 2025
1
Page 1 of 1