v1v2 (latest)

ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces

AAAI Conference on Artificial Intelligence (AAAI), 2020

22 December 2020

Blaise Agüera y Arcas

ArXiv (abs)PDF HTML

Papers citing "ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces"

45 / 45 papers shown

CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

362

05 Mar 2025

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Yian Wang

...

317

263

30 Oct 2024

EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data

515

25 Oct 2024

UI-JEPA: Towards Active Perception of User Intent through Onscreen User ActivityUser Modeling, Adaptation, and Personalization (UMAP), 2024

474

06 Sep 2024

OmniParser for Pure Vision Based GUI Agent

Yadong Lu

Jianwei Yang

Yelong Shen

Ahmed Hassan Awadallah

MLLM

429

154

01 Aug 2024

VideoGUI: A Benchmark for GUI Automation from Instructional VideosNeural Information Processing Systems (NeurIPS), 2024

Kevin Qinghong Lin

Difei Gao

Zhengyuan Yang

Mike Zheng Shou

277

14 Jun 2024

Tell Me What's Next: Textual Foresight for Generic UI Representations

Andrea Burns

Kate Saenko

Bryan A. Plummer

LM&Ro AI4TS

329

12 Jun 2024

Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning

Lucas-Andrei Thil

Mirela Popa

Gerasimos Spanakis

LLMAG

171

01 May 2024

Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLMInternational Conference on Human Factors in Computing Systems (CHI), 2024

244

03 Apr 2024

Computer User Interface Understanding. A New Dataset and a Learning Framework

Andrés Munoz

Daniel Borrajo

233

15 Mar 2024

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

555

126

27 Feb 2024

AI Assistance for UX: A Literature Review Through Human-Centered AI

436

08 Feb 2024

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

1.0K

103

07 Feb 2024

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Zhiyong Wu

826

446

17 Jan 2024

WebVLN: Vision-and-Language Navigation on Websites

Qi Wu

285

25 Dec 2023

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

...

411

20 Dec 2023

UINav: A Practical Approach to Train On-Device Automation AgentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Wei Li

Fu-Lin Hsu

Will Bishop

Folawiyo Campbell-Ajala

Max Lin

Oriana Riva

601

15 Dec 2023

Automatic Macro Mining from Interaction Traces at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2023

404

10 Oct 2023

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

239

07 Oct 2023

EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal LearningInternational Conference on Software Engineering (ICSE), 2023

317

18 Sep 2023

A Survey for Graphic Design Intelligence

321

04 Sep 2023

Never-ending Learning of User InterfacesACM Symposium on User Interface Software and Technology (UIST), 2023

195

17 Aug 2023

Video2Action: Reducing Human Interactions in Action Annotation of App Tutorial VideosACM Symposium on User Interface Software and Technology (UIST), 2023

Sidong Feng

Chunyang Chen

Zhenchang Xing

224

07 Aug 2023

Android in the Wild: A Large-Scale Dataset for Android Device ControlNeural Information Processing Systems (NeurIPS), 2023

458

294

19 Jul 2023

Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023

Hiroki Furuta

550

157

19 May 2023

Towards Flexible Multi-modal Document ModelsComputer Vision and Pattern Recognition (CVPR), 2023

262

31 Mar 2023

WebUI: A Dataset for Enhancing Visual UI Understanding with Web SemanticsInternational Conference on Human Factors in Computing Systems (CHI), 2023

284

30 Jan 2023

Lexi: Self-Supervised Learning of the UI LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

204

23 Jan 2023

Screen Correspondence: Mapping Interchangeable Elements between UIs

331

20 Jan 2023

UGIF: UI Grounded Instruction Following

S. Venkatesh

Partha P. Talukdar

S. Narayanan

356

14 Nov 2022

Understanding HTML with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Sharan Narang

598

08 Oct 2022

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingInternational Conference on Machine Learning (ICML), 2022

Julian Martin Eisenschlos

916

414

07 Oct 2022

Towards Better Semantic Understanding of Mobile InterfacesInternational Conference on Computational Linguistics (COLING), 2022

386

06 Oct 2022

MUG: Interactive Multimodal Grounding on User InterfacesFindings (Findings), 2022

212

29 Sep 2022

Spotlight: Mobile UI Understanding using Vision-Language Models with a FocusInternational Conference on Learning Representations (ICLR), 2022

Gang Li

Yang Li

488

29 Sep 2022

Extracting Replayable Interactions from Videos of Mobile App Usage

215

09 Jul 2022

DETR++: Taming Your Multi-Scale Detection Transformer

165

07 Jun 2022

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUIConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

321

23 May 2022

Multimodal Conversational AI: A Survey of Datasets and Approaches

Anirudh S. Sundar

Larry Heck

181

13 May 2022

GreaseVision: Rewriting the Rules of the Interface

Siddhartha Datta

Konrad Kollnig

N. Shadbolt

269

07 Apr 2022

Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency AnalysisInternational Conference on Human Factors in Computing Systems (CHI), 2022

270

05 Apr 2022

Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs

Taichi Iki

Akiko Aizawa

LLMAG

221

15 Mar 2022

VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling

213

10 Dec 2021

UIBert: Learning Generic Multimodal Representations for UI UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2021

Blaise Agüera y Arcas

308

120

29 Jul 2021

Understanding Mobile GUI: from Pixel-Words to Screen-Sentences

310

25 May 2021