I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models

6 June 2023

Papers citing "I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models"

4 / 4 papers shown

Title
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs Alexander von Recum Christoph Schnabl Gabor Hollbeck Silas Alberti Philip Blinde Marvin von Hagen 90 2 0 22 Dec 2024
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech Dominika Woszczyk Soteris Demetriou 20 0 0 05 Oct 2024
Programming Refusal with Conditional Activation Steering Bruce W. Lee Inkit Padhi K. Ramamurthy Erik Miehling Pierre L. Dognin Manish Nagireddy Amit Dhurandhar LLMSV 91 13 0 06 Sep 2024
SoK: Memorization in General-Purpose Large Language Models Valentin Hartmann Anshuman Suri Vincent Bindschaedler David E. Evans Shruti Tople Robert West KELM LLMAG 16 20 0 24 Oct 2023