ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.10501
  4. Cited By
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications
  with Programmable Rails

NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

16 October 2023
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
    KELM
ArXivPDFHTML

Papers citing "NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails"

50 / 96 papers shown
Title
Safeguard-by-Development: A Privacy-Enhanced Development Paradigm for Multi-Agent Collaboration Systems
Safeguard-by-Development: A Privacy-Enhanced Development Paradigm for Multi-Agent Collaboration Systems
Jian Cui
Zichuan Li
Luyi Xing
Xiaojing Liao
17
0
0
07 May 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILM
AAML
ELM
75
0
0
25 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David E. Evans
LLMSV
72
0
0
23 Apr 2025
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
Xinzhe Huang
Kedong Xiu
T. Zheng
Churui Zeng
Wangze Ni
Zhan Qiin
K. Ren
C. L. P. Chen
AAML
20
0
0
21 Apr 2025
MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning
MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning
Yahan Yang
Soham Dan
Shuo Li
Dan Roth
Insup Lee
LRM
22
0
0
21 Apr 2025
The Structural Safety Generalization Problem
The Structural Safety Generalization Problem
Julius Broomfield
Tom Gibbs
Ethan Kosak-Hine
George Ingebretsen
Tia Nasir
Jason Zhang
Reihaneh Iranmanesh
Sara Pieri
Reihaneh Rabbany
Kellin Pelrine
AAML
23
0
0
13 Apr 2025
X-Guard: Multilingual Guard Agent for Content Moderation
X-Guard: Multilingual Guard Agent for Content Moderation
Bibek Upadhayay
Vahid Behzadan
Ph.D
29
1
0
11 Apr 2025
Large Language Models are Unreliable for Cyber Threat Intelligence
Large Language Models are Unreliable for Cyber Threat Intelligence
Emanuele Mezzi
Fabio Massacci
Katja Tuma
31
0
0
29 Mar 2025
Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning
Matthew Khoriaty
Andrii Shportko
Gustavo Mercier
Zach Wood-Doughty
MU
42
0
0
14 Mar 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
40
0
0
24 Feb 2025
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Prompt Inject Detection with Generative Explanation as an Investigative Tool
Jonathan Pan
Swee Liang Wong
Yidi Yuan
Xin Wei Chia
SILM
46
0
0
16 Feb 2025
FLAME: Flexible LLM-Assisted Moderation Engine
FLAME: Flexible LLM-Assisted Moderation Engine
Ivan Bakulin
Ilia Kopanichuk
Iaroslav Bespalov
Nikita Radchenko
V. Shaposhnikov
Dmitry V. Dylov
Ivan Oseledets
84
0
0
13 Feb 2025
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Shanshan Han
Salman Avestimehr
Chaoyang He
71
0
0
12 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun-Xiong Xia
Tianyi Wu
Zhiwei Xue
Y. Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
124
13
0
30 Jan 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
39
0
0
28 Jan 2025
Beyond Benchmarks: On The False Promise of AI Regulation
Gabriel Stanovsky
Renana Keydar
Gadi Perl
Eliya Habba
39
1
0
28 Jan 2025
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment
Melissa Kazemi Rad
Huy Nghiem
Andy Luo
Sahil Wadhwa
Mohammad Sorower
Stephen Rawls
AAML
89
2
0
22 Jan 2025
Position: A taxonomy for reporting and describing AI security incidents
Position: A taxonomy for reporting and describing AI security incidents
L. Bieringer
Kevin Paeth
Andreas Wespi
Kathrin Grosse
Alexandre Alahi
Kathrin Grosse
78
0
0
19 Dec 2024
Improved Large Language Model Jailbreak Detection via Pretrained
  Embeddings
Improved Large Language Model Jailbreak Detection via Pretrained Embeddings
Erick Galinkin
Martin Sablotny
68
0
0
02 Dec 2024
Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
Aaron Zheng
Mansi Rana
Andreas Stolcke
64
1
0
21 Nov 2024
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
Gabriel Chua
Shing Yee Chan
Shaun Khoo
75
1
0
20 Nov 2024
AI Ethics by Design: Implementing Customizable Guardrails for
  Responsible AI Development
AI Ethics by Design: Implementing Customizable Guardrails for Responsible AI Development
Kristina Šekrst
Jeremy McHugh
Jonathan Rodriguez Cefalu
57
0
0
05 Nov 2024
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a
  Multi-Model System
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System
Julian Collado
Kevin Stangl
AAML
16
0
0
30 Oct 2024
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
Yahan Yang
Soham Dan
Dan Roth
Insup Lee
27
5
0
29 Oct 2024
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Honglin Mu
Han He
Yuxin Zhou
Yunlong Feng
Yang Xu
...
Zeming Liu
Xudong Han
Qi Shi
Qingfu Zhu
Wanxiang Che
AAML
33
1
0
28 Oct 2024
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A
  Comparative Analysis
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
24
1
0
21 Oct 2024
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in
  Integrating LLMs into Software Products
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products
Nadia Nahar
Christian Kastner
Jenna L. Butler
Chris Parnin
Thomas Zimmermann
Christian Bird
57
2
0
15 Oct 2024
On Calibration of LLM-based Guard Models for Reliable Content Moderation
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Hongfu Liu
Hengguan Huang
Hao Wang
Xiangming Gu
Ye Wang
51
2
0
14 Oct 2024
Survival of the Safest: Towards Secure Prompt Optimization through
  Interleaved Multi-Objective Evolution
Survival of the Safest: Towards Secure Prompt Optimization through Interleaved Multi-Objective Evolution
Ankita Sinha
Wendi Cui
Kamalika Das
Jiaxin Zhang
AAML
23
2
0
12 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
68
1
0
09 Oct 2024
TaeBench: Improving Quality of Toxic Adversarial Examples
TaeBench: Improving Quality of Toxic Adversarial Examples
Xuan Zhu
Dmitriy Bespalov
Liwen You
Ninad Kulkarni
Yanjun Qi
AAML
58
0
0
08 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
45
10
0
03 Oct 2024
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large
  Language Models in Scientific Tasks
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks
Tianhao Li
Jingyu Lu
Chuangxin Chu
Tianyu Zeng
Yujia Zheng
...
Xuejing Yuan
Xingkai Wang
Keyan Ding
Huajun Chen
Qiang Zhang
ELM
26
3
0
02 Oct 2024
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic
  Self-Disclosure
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure
Xi Chen
Zhiyang Zhang
Fangkai Yang
Xiaoting Qin
Chao Du
...
Hangxin Liu
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
23
1
0
26 Sep 2024
Enhancing Guardrails for Safe and Secure Healthcare AI
Enhancing Guardrails for Safe and Secure Healthcare AI
Ananya Gangavarapu
14
0
0
25 Sep 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in
  Red Teaming GenAI
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat
Stefan Schoepf
Giulio Zizzo
Giandomenico Cornacchia
Muhammad Zaid Hameed
...
Elizabeth M. Daly
Mark Purcell
P. Sattigeri
Pin-Yu Chen
Kush R. Varshney
AAML
40
6
0
23 Sep 2024
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless
  Integration of Multi Active/Passive Core-Agents
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents
Amine B. Hassouna
Hana Chaari
Ines Belhaj
LLMAG
30
1
0
17 Sep 2024
No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with
  Company Size
No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with Company Size
Ashok Urlana
Charaka Vinayak Kumar
B. Garlapati
Ajeet Kumar Singh
Rahul Mishra
33
1
0
21 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for
  Evaluation
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
David Schlangen
27
1
0
18 Jul 2024
Social and Ethical Risks Posed by General-Purpose LLMs for Settling
  Newcomers in Canada
Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada
I. Nejadgholi
Maryam Molamohammadi
Samir Bakhtawar
43
0
0
15 Jul 2024
Grounding and Evaluation for Large Language Models: Practical Challenges
  and Lessons Learned (Survey)
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
K. Kenthapadi
M. Sameki
Ankur Taly
HILM
ELM
AILaw
34
12
0
10 Jul 2024
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
Victoria R. Li
Yida Chen
Naomi Saphra
27
3
0
09 Jul 2024
When in Doubt, Cascade: Towards Building Efficient and Capable
  Guardrails
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Manish Nagireddy
Inkit Padhi
Soumya Ghosh
P. Sattigeri
27
1
0
08 Jul 2024
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via
  Knowledge-Enhanced Logical Reasoning
R2R^2R2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Mintong Kang
Bo-wen Li
LRM
24
12
0
08 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
34
12
0
06 Jul 2024
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
Hannah Brown
Leon Lin
Kenji Kawaguchi
Michael Shieh
AAML
59
6
0
03 Jul 2024
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content
  Moderation of Large Language Models
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
Hayder Elesedy
Pedro M. Esperança
Silviu Vlad Oprea
Mete Ozay
KELM
16
2
0
03 Jul 2024
Building Understandable Messaging for Policy and Evidence Review
  (BUMPER) with AI
Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI
Katherine A. Rosenfeld
Maike Sonnewald
Sonia J. Jindal
Kevin A. McCarthy
Joshua L. Proctor
16
0
0
27 Jun 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response
  Disparity Guidance
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
Caishuang Huang
Wanxu Zhao
Rui Zheng
Huijie Lv
Shihan Dou
...
Junjie Ye
Yuming Yang
Tao Gui
Qi Zhang
Xuanjing Huang
LLMSV
AAML
37
7
0
26 Jun 2024
12
Next