ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05163
  4. Cited By
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

7 February 2025
Yihe Deng
Yu Yang
Junkai Zhang
Wei Wang
B. Li
    OffRL
ArXiv (abs)PDFHTMLHuggingFace (22 upvotes)

Papers citing "DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails"

13 / 13 papers shown
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
Richard J. Young
ELM
137
0
0
27 Nov 2025
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
Hongliang Lu
Yuhang Wen
Pengyu Cheng
Ruijin Ding
Haotian Xu
Jiaqi Guo
Chutian Wang
Haonan Chen
Xiaoxi Jiang
Guanjun Jiang
LRM
127
3
0
21 Oct 2025
Qwen3Guard Technical Report
Qwen3Guard Technical Report
H. Vicky Zhao
C. Yuan
Fei Huang
X. S. Hu
Yichang Zhang
...
Y. Li
Yi Zhang
Yong Jiang
Yu Wan
Y. Zhou
155
19
0
16 Oct 2025
CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications
CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications
Raviraj Joshi
Rakesh Paul
Kanishk Singla
Anusha Kamath
Michael Evans
...
Shaona Ghosh
Utkarsh Vaidya
E. Long
Sanjay Singh Chauhan
Niranjan Wartikar
311
0
0
03 Aug 2025
The Problem with Safety Classification is not just the Models
The Problem with Safety Classification is not just the Models
Sowmya Vajjala
96
0
0
29 Jul 2025
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators
Leanne Tan
Gabriel Chua
Ziyu Ge
Roy Ka-Wei Lee
232
3
0
21 Jul 2025
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
Yash Datta
Sharath Rajasekar
188
1
0
09 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAMLLRM
311
13
0
09 Jun 2025
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities
Sahil Verma
Keegan E. Hines
J. Bilmes
Charlotte Siska
Luke Zettlemoyer
Hila Gonen
Chandan Singh
AAML
513
5
0
29 May 2025
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs
Jiawei Kong
Hao Fang
Xiaochen Yang
Kuofeng Gao
Bin Chen
Shu-Tao Xia
Yaowei Wang
Min Zhang
AAML
367
2
0
23 May 2025
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
Yahan Yang
Soham Dan
Shuo Li
Dan Roth
Insup Lee
LRM
401
0
0
21 Apr 2025
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar
Devansh Jain
Akhila Yerukola
Liwei Jiang
Himanshu Beniwal
Thomas Hartvigsen
Maarten Sap
357
12
0
06 Apr 2025
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
David Noever
Grant Rosario
671
0
0
20 Feb 2025
1