ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.03486
  4. Cited By
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and
  AI-Generated Images

UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

6 May 2024
Y. Qu
Xinyue Shen
Yixin Wu
Michael Backes
Savvas Zannettou
Yang Zhang
    EGVM
ArXivPDFHTML

Papers citing "UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images"

8 / 8 papers shown
Title
ShieldGemma 2: Robust and Tractable Image Content Moderation
ShieldGemma 2: Robust and Tractable Image Content Moderation
Wenjun Zeng
D. Kurniawan
Ryan Mullins
Yuchi Liu
Tamoghna Saha
...
Mani Malek
Hamid Palangi
Joon Baek
Rick Pereira
Karthik Narasimhan
AI4MH
31
0
0
01 Apr 2025
MLLM-as-a-Judge for Image Safety without Human Labeling
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang
Shuming Hu
Shiyu Zhao
Xiaowen Lin
F. Xu
...
Nan Jiang
Lingjuan Lyu
Shiqing Ma
Dimitris N. Metaxas
Ankit Jain
67
1
0
31 Dec 2024
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
Xuanyu Su
Yansong Li
Diana Inkpen
Nathalie Japkowicz
VLM
81
2
0
11 Aug 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
Yibo Miao
Yifan Zhu
Yinpeng Dong
Lijia Yu
Jun Zhu
Xiao-Shan Gao
EGVM
31
12
0
08 Jul 2024
Moderating Illicit Online Image Promotion for Unsafe User-Generated
  Content Games Using Large Vision-Language Models
Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models
Keyan Guo
Ayush Utkarsh
Wenbo Ding
Isabelle Ondracek
Ziming Zhao
Guo Freeman
Nishant Vishwamitra
Hongxin Hu
37
5
0
27 Mar 2024
Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations
Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations
Naquee Rizwan
Paramananda Bhaskar
Mithun Das
Swadhin Satyaprakash Majhi
Punyajoy Saha
Animesh Mukherjee
VLM
24
3
0
19 Feb 2024
Red-Teaming the Stable Diffusion Safety Filter
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
122
179
0
03 Oct 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for
  Benchmarking and Detecting Emoji-based Hate
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
63
57
0
12 Aug 2021
1