Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2502.05163
Cited By

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

7 February 2025

ArXiv (abs)PDF HTML HuggingFace (22 upvotes)

Papers citing "DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails"

13 / 13 papers shown

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Richard J. Young

137

0

0

27 Nov 2025

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

127

3

0

21 Oct 2025

Qwen3Guard Technical Report

Qwen3Guard Technical Report

...

155

19

0

16 Oct 2025

CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications

CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications

...

Sanjay Singh Chauhan

Niranjan Wartikar

311

0

0

03 Aug 2025

The Problem with Safety Classification is not just the Models

The Problem with Safety Classification is not just the Models

96

0

0

29 Jul 2025

LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators

LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators

232

3

0

21 Jul 2025

JavelinGuard: Low-Cost Transformer Architectures for LLM Security

JavelinGuard: Low-Cost Transformer Architectures for LLM Security

Sharath Rajasekar

188

1

0

09 Jun 2025

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

311

13

0

09 Jun 2025

OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities

OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities

Keegan E. Hines

Charlotte Siska

Luke Zettlemoyer

513

5

0

29 May 2025

Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs

Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs

367

2

0

23 May 2025

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety

401

0

0

21 Apr 2025

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar

Akhila Yerukola

Himanshu Beniwal

Thomas Hartvigsen

357

12

0

06 Apr 2025

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries

671

0

0

20 Feb 2025