ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01822
  4. Cited By
Building Guardrails for Large Language Models

Building Guardrails for Large Language Models

2 February 2024
Yizhen Dong
Ronghui Mu
Gao Jin
Yi Qi
Jinwei Hu
Xingyu Zhao
Jie Meng
Wenjie Ruan
Xiaowei Huang
    OffRL
ArXivPDFHTML

Papers citing "Building Guardrails for Large Language Models"

14 / 14 papers shown
Title
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for
  Sexual Education in Rural India
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
Salam Michael Singh
Shubhmoy Kumar Garg
Amitesh Misra
Aaditeshwar Seth
Tanmoy Chakraborty
18
0
0
03 May 2024
Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs
Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs
Huaman Sun
Jiaxin Pei
Minje Choi
David Jurgens
63
16
0
16 Nov 2023
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta
Vaishnavi Shrivastava
A. Deshpande
A. Kalyan
Peter Clark
Ashish Sabharwal
Tushar Khot
105
49
0
08 Nov 2023
Robust Safety Classifier for Large Language Models: Adversarial Prompt
  Shield
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
Jinhwa Kim
Ali Derakhshan
Ian G. Harris
AAML
56
16
0
31 Oct 2023
Large Language Models Can Be Good Privacy Protection Learners
Large Language Models Can Be Good Privacy Protection Learners
Yijia Xiao
Yiqiao Jin
Yushi Bai
Yue Wu
Xianjun Yang
...
Xujiang Zhao
Yanchi Liu
Haifeng Chen
Wei Wang
Wei Cheng
PILM
95
17
0
03 Oct 2023
SCOTT: Self-Consistent Chain-of-Thought Distillation
SCOTT: Self-Consistent Chain-of-Thought Distillation
Jamie Yap
Zhengyang Wang
Zheng Li
K. Lynch
Bing Yin
Xiang Ren
LRM
57
91
0
03 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
145
386
0
15 Mar 2023
Rethinking with Retrieval: Faithful Large Language Model Inference
Rethinking with Retrieval: Faithful Large Language Model Inference
Hangfeng He
Hongming Zhang
Dan Roth
KELM
LRM
135
151
0
31 Dec 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
Improving deep neural network generalization and robustness to
  background bias via layer-wise relevance propagation optimization
Improving deep neural network generalization and robustness to background bias via layer-wise relevance propagation optimization
P. R. Bassi
Sergio S J Dertkigil
Andrea Cavalli
AI4CE
31
26
0
01 Feb 2022
Differentially Private Fine-tuning of Language Models
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
128
258
0
13 Oct 2021
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
236
191
0
15 Sep 2021
A Safety Framework for Critical Systems Utilising Deep Neural Networks
A Safety Framework for Critical Systems Utilising Deep Neural Networks
Xingyu Zhao
Alec Banks
James Sharp
Valentin Robu
David Flynn
Michael Fisher
Xiaowei Huang
AAML
42
45
0
07 Mar 2020
Safety Verification of Deep Neural Networks
Safety Verification of Deep Neural Networks
Xiaowei Huang
M. Kwiatkowska
Sen Wang
Min Wu
AAML
172
883
0
21 Oct 2016
1