Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs

22 December 2024

Papers citing "Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs"

1 / 1 papers shown

Title
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices Sander Noels Guillaume Bied Maarten Buyl Alexander Rogiers Yousra Fettach Jefrey Lijffijt Tijl De Bie 25 0 0 04 Apr 2025