ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03423
  4. Cited By
I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box
  Generative Language Models

I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models

6 June 2023
Max Reuter
William B. Schulze
ArXivPDFHTML

Papers citing "I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models"

4 / 4 papers shown
Title
Cannot or Should Not? Automatic Analysis of Refusal Composition in
  IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
90
2
0
22 Dec 2024
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia
  Obfuscation in Transcribed Speech
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech
Dominika Woszczyk
Soteris Demetriou
20
0
0
05 Oct 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
91
13
0
06 Sep 2024
SoK: Memorization in General-Purpose Large Language Models
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
16
20
0
24 Oct 2023
1