LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content
Moderation of Large Language Models

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

3 July 2024

Pedro M. Esperança

Silviu Vlad Oprea

Papers citing "LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models"

6 / 6 papers shown

Title
No Free Lunch with Guardrails Divyanshu Kumar Nitin Aravind Birur Tanay Baswa Sahil Agarwal P. Harshangi 54 1 0 01 Apr 2025
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond Shanshan Han 84 1 0 09 Oct 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey Zhichen Dong Zhanhui Zhou Chao Yang Jing Shao Yu Qiao ELM 52 55 0 14 Feb 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts Jiahao Yu Xingwei Lin Zheng Yu Xinyu Xing SILM 117 301 0 19 Sep 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,953 0 04 Mar 2022
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,848 0 18 Apr 2021