Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2506.14866
Cited By

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

v1v2 (latest)

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

17 June 2025

Francesco Croce

Nicolas Flammarion

Maksym Andriushchenko

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)Github (38★)

Papers citing "OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents"

10 / 10 papers shown

Are Your Agents Upward Deceivers?

Are Your Agents Upward Deceivers?

...

146

0

0

04 Dec 2025

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

...

Zhuosheng Zhang

181

2

0

28 Oct 2025

OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents

173

4

0

28 Oct 2025

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Shahriar Kabir Nahin

Anshuman Chhabra

299

4

0

27 Oct 2025

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

196

0

0

23 Oct 2025

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Erfan Shayegani

Nael B. Abu-Ghazaleh

Spencer Whitehead

Vidhisha Balachandran

148

0

0

02 Oct 2025

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

209

0

0

27 Aug 2025

Reliable Weak-to-Strong Monitoring of LLM Agents

Reliable Weak-to-Strong Monitoring of LLM Agents

Chen Bo Calvin Zhang

Paula Rodriguez

Christina Q. Knight

184

2

0

26 Aug 2025

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Eric Fosler-Lussier

Eric Fosler-Lussier

438

10

0

28 May 2025

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

441

11

0

16 May 2025