ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.10894
  4. Cited By
Red Teaming Deep Neural Networks with Feature Synthesis Tools

Red Teaming Deep Neural Networks with Feature Synthesis Tools

8 February 2023
Stephen Casper
Yuxiao Li
Jiawei Li
Tong Bu
Ke Zhang
K. Hariharan
Dylan Hadfield-Menell
    AAML
ArXivPDFHTML

Papers citing "Red Teaming Deep Neural Networks with Feature Synthesis Tools"

11 / 11 papers shown
Title
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
70
18
0
02 Jul 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
75
0
25 Jan 2024
Labeling Neural Representations with Inverse Recognition
Labeling Neural Representations with Inverse Recognition
Kirill Bykov
Laura Kopf
Shinichi Nakajima
Marius Kloft
Marina M.-C. Höhne
BDL
19
15
0
22 Nov 2023
Red-Teaming the Stable Diffusion Safety Filter
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
122
179
0
03 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
Natural Language Descriptions of Deep Visual Features
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
194
116
0
26 Jan 2022
Robust Feature-Level Adversaries are Interpretability Tools
Robust Feature-Level Adversaries are Interpretability Tools
Stephen Casper
Max Nadeau
Dylan Hadfield-Menell
Gabriel Kreiman
AAML
40
27
0
07 Oct 2021
RobustBench: a standardized adversarial robustness benchmark
RobustBench: a standardized adversarial robustness benchmark
Francesco Croce
Maksym Andriushchenko
Vikash Sehwag
Edoardo Debenedetti
Nicolas Flammarion
M. Chiang
Prateek Mittal
Matthias Hein
VLM
217
668
0
19 Oct 2020
On Interpretability of Deep Learning based Skin Lesion Classifiers using
  Concept Activation Vectors
On Interpretability of Deep Learning based Skin Lesion Classifiers using Concept Activation Vectors
Adriano Lucieri
Muhammad Naseer Bajwa
S. Braun
M. I. Malik
Andreas Dengel
Sheraz Ahmed
MedIm
154
64
0
05 May 2020
Towards A Rigorous Science of Interpretable Machine Learning
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
225
3,658
0
28 Feb 2017
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1