UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and
AI-Generated Images

UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

6 May 2024

Michael Backes

Savvas Zannettou

Papers citing "UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images"

8 / 8 papers shown

Title
ShieldGemma 2: Robust and Tractable Image Content Moderation Wenjun Zeng D. Kurniawan Ryan Mullins Yuchi Liu Tamoghna Saha ... Mani Malek Hamid Palangi Joon Baek Rick Pereira Karthik Narasimhan AI4MH 31 0 0 01 Apr 2025
MLLM-as-a-Judge for Image Safety without Human Labeling Zhenting Wang Shuming Hu Shiyu Zhao Xiaowen Lin F. Xu ... Nan Jiang Lingjuan Lyu Shiqing Ma Dimitris N. Metaxas Ankit Jain 67 1 0 31 Dec 2024
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes Xuanyu Su Yansong Li Diana Inkpen Nathalie Japkowicz VLM 81 2 0 11 Aug 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models Yibo Miao Yifan Zhu Yinpeng Dong Lijia Yu Jun Zhu Xiao-Shan Gao EGVM 31 12 0 08 Jul 2024
Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models Keyan Guo Ayush Utkarsh Wenbo Ding Isabelle Ondracek Ziming Zhao Guo Freeman Nishant Vishwamitra Hongxin Hu 37 5 0 27 Mar 2024
Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations Naquee Rizwan Paramananda Bhaskar Mithun Das Swadhin Satyaprakash Majhi Punyajoy Saha Animesh Mukherjee VLM 24 3 0 19 Feb 2024
Red-Teaming the Stable Diffusion Safety Filter Javier Rando Daniel Paleka David Lindner Lennard Heim Florian Tramèr DiffM 122 179 0 03 Oct 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate Hannah Rose Kirk B. Vidgen Paul Röttger Tristan Thrush Scott A. Hale 63 57 0 12 Aug 2021