ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.16913
  4. Cited By
Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
v1v2 (latest)

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

Annual Meeting of the Association for Computational Linguistics (ACL), 2024
25 September 2024
Wenhao Liu
Siyu An
Junru Lu
Muling Wu
Tianlong Li
Xiaohua Wang
Changze Lv
Xiaoqing Zheng
Di Yin
Xing Sun
Xuanjing Huang
ArXiv (abs)PDFHTMLGithub

Papers citing "Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing"

2 / 2 papers shown
From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses
From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses
Xiangtao Meng
Tianshuo Cong
Li Wang
Wenyu Chen
Zheng Li
Shanqing Guo
Xiaoyun Wang
AAML
211
2
0
09 Oct 2025
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
Wenhao Liu
Wenhao Liu
Mingchen Xie
Jingwen Xu
Zisu Huang
...
Changze Lv
He-Da Wang
Qi Zhang
Xiaoqing Zheng
Xuanjing Huang
561
1
0
25 May 2025
1
Page 1 of 1