Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.03486
Cited By
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
6 May 2024
Y. Qu
Xinyue Shen
Yixin Wu
Michael Backes
Savvas Zannettou
Yang Zhang
EGVM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images"
8 / 8 papers shown
Title
ShieldGemma 2: Robust and Tractable Image Content Moderation
Wenjun Zeng
D. Kurniawan
Ryan Mullins
Yuchi Liu
Tamoghna Saha
...
Mani Malek
Hamid Palangi
Joon Baek
Rick Pereira
Karthik Narasimhan
AI4MH
31
0
0
01 Apr 2025
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang
Shuming Hu
Shiyu Zhao
Xiaowen Lin
F. Xu
...
Nan Jiang
Lingjuan Lyu
Shiqing Ma
Dimitris N. Metaxas
Ankit Jain
67
1
0
31 Dec 2024
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
Xuanyu Su
Yansong Li
Diana Inkpen
Nathalie Japkowicz
VLM
81
2
0
11 Aug 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Yibo Miao
Yifan Zhu
Yinpeng Dong
Lijia Yu
Jun Zhu
Xiao-Shan Gao
EGVM
31
12
0
08 Jul 2024
Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models
Keyan Guo
Ayush Utkarsh
Wenbo Ding
Isabelle Ondracek
Ziming Zhao
Guo Freeman
Nishant Vishwamitra
Hongxin Hu
37
5
0
27 Mar 2024
Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations
Naquee Rizwan
Paramananda Bhaskar
Mithun Das
Swadhin Satyaprakash Majhi
Punyajoy Saha
Animesh Mukherjee
VLM
24
3
0
19 Feb 2024
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
122
179
0
03 Oct 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
63
57
0
12 Aug 2021
1