Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

8 October 2020

Papers citing "Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements"

50 / 78 papers shown

Grounding Computer Use Agents on Human Demonstrations

...

172

10 Nov 2025

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

...

136

23 Oct 2025

UIPro: Unleashing Superior Interaction Capability For GUI Agents

235

22 Sep 2025

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

128

17 Sep 2025

Towards Understanding Visual Grounding in Visual Language Models

Georgios Pantazopoulos

Eda B. Özyiğit

ObjD

320

12 Sep 2025

UItron: Foundational GUI Agent with Advanced Perception and Planning

193

29 Aug 2025

AccessGuru: Leveraging LLMs to Detect and Correct Web Accessibility Violations in HTML Code

144

24 Jul 2025

MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning

...

426

19 Jul 2025

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

313

09 Jul 2025

GTA1: GUI Test-time Scaling Agent

...

402

08 Jul 2025

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

244

09 Jun 2025

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

...

401

19 May 2025

GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning

437

18 May 2025

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

441

16 May 2025

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

LLMAG LM&Ro LRM AI4CE

382

19 Apr 2025

UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

565

15 Apr 2025

Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions

393

31 Mar 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

...

1.2K

19 Mar 2025

MP-GUI: Modality Perception with MLLMs for GUI UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

338

18 Mar 2025

Unified Autoregressive Visual Generation and Understanding with Continuous Tokens

...

316

17 Mar 2025

DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

356

14 Mar 2025

SpiritSight Agent: Advanced GUI Agent with One LookComputer Vision and Pattern Recognition (CVPR), 2025

423

05 Mar 2025

MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App InstructionsKnowledge Discovery and Data Mining (KDD), 2025

440

24 Feb 2025

AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

995

04 Feb 2025

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

318

08 Jan 2025

Towards Human-AI Synergy in UI Design: Enhancing Multi-Agent Based UI Generation with Intent Clarification and Alignment

244

28 Dec 2024

Aria-UI: Visual Grounding for GUI InstructionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

502

20 Dec 2024

Falcon-UI: Understanding GUI Before Following User Instructions

379

12 Dec 2024

Foundations and Recent Trends in Multimodal Mobile Agents: A Survey

LM&Ro LLMAG OffRL AI4TS

426

04 Nov 2024

EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data

466

25 Oct 2024

Harnessing Webpage UIs for Text-Rich Visual UnderstandingInternational Conference on Learning Representations (ICLR), 2024

Chenyan Xiong

Graham Neubig

377

17 Oct 2024

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsInternational Conference on Learning Representations (ICLR), 2024

636

236

07 Oct 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Haotian Zhang

Mingfei Gao

...

Zirui Wang

Yinfei Yang

303

30 Sep 2024

Inferring Alt-text For UI Icons With Large Language Models During App Development

Sabrina Haque

Christoph Csallner

VLM

263

26 Sep 2024

MobileViews: A Large-Scale Mobile GUI Dataset

Longxi Gao

Li Zhang

Shihe Wang

Shangguang Wang

Yuanchun Li

Mengwei Xu

208

22 Sep 2024

WebQuest: A Benchmark for Multimodal QA on Web Page Sequences

Jindong Chen

306

06 Sep 2024

OmniParser for Pure Vision Based GUI Agent

Yadong Lu

Jianwei Yang

Yelong Shen

Ahmed Hassan Awadallah

MLLM

334

121

01 Aug 2024

Flowy: Supporting UX Design Decisions Through AI-Driven Pattern Annotation in Multi-Screen User Flows

Toby Jia-Jun Li

309

23 Jun 2024

Tell Me What's Next: Textual Foresight for Generic UI Representations

Andrea Burns

Kate Saenko

Bryan A. Plummer

LM&Ro AI4TS

282

12 Jun 2024

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

...

Yujiu Yang

Yingchun Wang

277

11 Jun 2024

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

266

11 May 2024

GUing: A Mobile GUI Search Engine using a Vision-Language Model

197

30 Apr 2024

Benchmarking Mobile Device Control Agents across Diverse Configurations

360

25 Apr 2024

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Graham Neubig

289

09 Apr 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

...

Chengqi Deng

457

647

08 Mar 2024

Enhancing Vision-Language Pre-training with Rich Supervisions

412

05 Mar 2024

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

493

105

27 Feb 2024

ScreenAgent: A Vision Language Model-driven Computer Control Agent

317

09 Feb 2024

AI Assistance for UX: A Literature Review Through Human-Centered AI

284

08 Feb 2024

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

853

07 Feb 2024