Red Teaming Deep Neural Networks with Feature Synthesis Tools

Red Teaming Deep Neural Networks with Feature Synthesis Tools

8 February 2023

Dylan Hadfield-Menell

Papers citing "Red Teaming Deep Neural Networks with Feature Synthesis Tools"

11 / 11 papers shown

Title
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 70 18 0 02 Jul 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 13 75 0 25 Jan 2024
Labeling Neural Representations with Inverse Recognition Kirill Bykov Laura Kopf Shinichi Nakajima Marius Kloft Marina M.-C. Höhne BDL 19 15 0 22 Nov 2023
Red-Teaming the Stable Diffusion Safety Filter Javier Rando Daniel Paleka David Lindner Lennard Heim Florian Tramèr DiffM 122 179 0 03 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 388 4,010 0 28 Jan 2022
Natural Language Descriptions of Deep Visual Features Evan Hernandez Sarah Schwettmann David Bau Teona Bagashvili Antonio Torralba Jacob Andreas MILM 194 116 0 26 Jan 2022
Robust Feature-Level Adversaries are Interpretability Tools Stephen Casper Max Nadeau Dylan Hadfield-Menell Gabriel Kreiman AAML 40 27 0 07 Oct 2021
RobustBench: a standardized adversarial robustness benchmark Francesco Croce Maksym Andriushchenko Vikash Sehwag Edoardo Debenedetti Nicolas Flammarion M. Chiang Prateek Mittal Matthias Hein VLM 217 668 0 19 Oct 2020
On Interpretability of Deep Learning based Skin Lesion Classifiers using Concept Activation Vectors Adriano Lucieri Muhammad Naseer Bajwa S. Braun M. I. Malik Andreas Dengel Sheraz Ahmed MedIm 154 64 0 05 May 2020
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 225 3,658 0 28 Feb 2017
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 279 39,083 0 01 Sep 2014