Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03423
Cited By
I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models
6 June 2023
Max Reuter
William B. Schulze
Re-assign community
ArXiv
PDF
HTML
Papers citing
"I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models"
4 / 4 papers shown
Title
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
90
2
0
22 Dec 2024
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech
Dominika Woszczyk
Soteris Demetriou
20
0
0
05 Oct 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
91
13
0
06 Sep 2024
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
16
20
0
24 Oct 2023
1