Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis

24 February 2021

Papers citing "Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis"

2 / 2 papers shown

Title
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space Leo Schwinn David Dobre Sophie Xhonneux Gauthier Gidel Stephan Gunnemann AAML 47 36 0 14 Feb 2024
Robust Out-of-distribution Detection for Neural Networks Jiefeng Chen Yixuan Li Xi Wu Yingyu Liang S. Jha OODD 150 84 0 21 Mar 2020