ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.12350
  4. Cited By
ActionBert: Leveraging User Actions for Semantic Understanding of User
  Interfaces
v1v2 (latest)

ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces

AAAI Conference on Artificial Intelligence (AAAI), 2020
22 December 2020
Zecheng He
Srinivas Sunkara
Xiaoxue Zang
Ying Xu
Lijuan Liu
Nevan Wichers
Gabriel Schubiner
Ruby B. Lee
Jindong Chen
Blaise Agüera y Arcas
ArXiv (abs)PDFHTML

Papers citing "ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces"

45 / 45 papers shown
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
Yuqi Zhou
Shuai Wang
Sunhao Dai
Qinglin Jia
Zhaocheng Du
Zhenhua Dong
Jun Xu
LM&Ro
347
5
0
05 Mar 2025
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Zhiyong Wu
Zhenyu Wu
Fangzhi Xu
Yian Wang
Qiushi Sun
...
Kanzhi Cheng
Zichen Ding
Lixing Chen
Paul Pu Liang
Yu Qiao
299
254
0
30 Oct 2024
EDGE: Enhanced Grounded GUI Understanding with Enriched
  Multi-Granularity Synthetic Data
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data
Xuetian Chen
Hangcheng Li
Jiaqing Liang
Sihang Jiang
Deqing Yang
LLMAG
501
7
0
25 Oct 2024
UI-JEPA: Towards Active Perception of User Intent through Onscreen User
  Activity
UI-JEPA: Towards Active Perception of User Intent through Onscreen User ActivityUser Modeling, Adaptation, and Personalization (UMAP), 2024
Yicheng Fu
R. Anantha
Prabal Vashisht
Jianpeng Cheng
Etai Littwin
464
4
0
06 Sep 2024
OmniParser for Pure Vision Based GUI Agent
OmniParser for Pure Vision Based GUI Agent
Yadong Lu
Jianwei Yang
Yelong Shen
Ahmed Hassan Awadallah
MLLM
413
154
0
01 Aug 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
VideoGUI: A Benchmark for GUI Automation from Instructional VideosNeural Information Processing Systems (NeurIPS), 2024
Kevin Qinghong Lin
Linjie Li
Difei Gao
Qinchen Wu
Mingyi Yan
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
277
25
0
14 Jun 2024
Tell Me What's Next: Textual Foresight for Generic UI Representations
Tell Me What's Next: Textual Foresight for Generic UI Representations
Andrea Burns
Kate Saenko
Bryan A. Plummer
LM&RoAI4TS
318
7
0
12 Jun 2024
Navigating WebAI: Training Agents to Complete Web Tasks with Large
  Language Models and Reinforcement Learning
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andrei Thil
Mirela Popa
Gerasimos Spanakis
LLMAG
170
6
0
01 May 2024
Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps
  via LLM
Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLMInternational Conference on Human Factors in Computing Systems (CHI), 2024
Zhe Liu
Chunyang Chen
Peng Li
Mengzhuo Chen
Boyu Wu
Yuekai Huang
Jun Hu
Qing Wang
244
28
0
03 Apr 2024
Computer User Interface Understanding. A New Dataset and a Learning
  Framework
Computer User Interface Understanding. A New Dataset and a Learning Framework
Andrés Munoz
Daniel Borrajo
233
0
0
15 Mar 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
550
126
0
27 Feb 2024
AI Assistance for UX: A Literature Review Through Human-Centered AI
AI Assistance for UX: A Literature Review Through Human-Centered AI
Yuwen Lu
Yuewen Yang
Qinyi Zhao
Chengzhi Zhang
Toby Jia-Jun Li
427
39
0
08 Feb 2024
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Gilles Baechler
Srinivas Sunkara
Maria Wang
Fedir Zubach
Hassan Mansoor
Vincent Etter
Victor Carbune
Jason Lin
Jindong Chen
Abhanshu Sharma
1.0K
101
0
07 Feb 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
798
421
0
17 Jan 2024
WebVLN: Vision-and-Language Navigation on Websites
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen
D. Pitawela
Chongyang Zhao
Gengze Zhou
Hsiang-Ting Chen
Qi Wu
274
21
0
25 Dec 2023
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
Difei Gao
Lei Ji
Zechen Bai
Mingyu Ouyang
Peiran Li
...
Peiyi Wang
Xiangwu Guo
Hengxu Wang
Luowei Zhou
Mike Zheng Shou
LLMAG
410
40
0
20 Dec 2023
UINav: A Practical Approach to Train On-Device Automation Agents
UINav: A Practical Approach to Train On-Device Automation AgentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Wei Li
Fu-Lin Hsu
Will Bishop
Folawiyo Campbell-Ajala
Max Lin
Oriana Riva
583
4
0
15 Dec 2023
Automatic Macro Mining from Interaction Traces at Scale
Automatic Macro Mining from Interaction Traces at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2023
Forrest Huang
Gang Li
Tao Li
Yang Li
396
18
0
10 Oct 2023
Reinforced UI Instruction Grounding: Towards a Generic UI Task
  Automation API
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Zhizheng Zhang
Wenxuan Xie
Xiaoyi Zhang
Yan Lu
226
18
0
07 Oct 2023
EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with
  Multimodal Learning
EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal LearningInternational Conference on Software Engineering (ICSE), 2023
Liuqing Chen
Yunnong Chen
Shuhong Xiao
Yaxuan Song
Lingyun Sun
Yankun Zhen
Tingting Zhou
Yan-fang Chang
310
10
0
18 Sep 2023
A Survey for Graphic Design Intelligence
A Survey for Graphic Design Intelligence
Danqing Huang
Jiaqi Guo
Shizhao Sun
Hanling Tian
Jieru Lin
Zheng Hu
Chin-Yew Lin
Jian-Guang Lou
Dongmei Zhang
310
13
0
04 Sep 2023
Never-ending Learning of User Interfaces
Never-ending Learning of User InterfacesACM Symposium on User Interface Software and Technology (UIST), 2023
Jason Wu
Rebecca Krosnick
E. Schoop
Amanda Swearngin
Jeffrey P. Bigham
Jeffrey Nichols
VLMHAI
195
24
0
17 Aug 2023
Video2Action: Reducing Human Interactions in Action Annotation of App
  Tutorial Videos
Video2Action: Reducing Human Interactions in Action Annotation of App Tutorial VideosACM Symposium on User Interface Software and Technology (UIST), 2023
Sidong Feng
Chunyang Chen
Zhenchang Xing
220
17
0
07 Aug 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
Android in the Wild: A Large-Scale Dataset for Android Device ControlNeural Information Processing Systems (NeurIPS), 2023
Christopher Rawles
Alice Li
Daniel Rodriguez
Oriana Riva
Timothy Lillicrap
LM&Ro
439
294
0
19 Jul 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
533
157
0
19 May 2023
Towards Flexible Multi-modal Document Models
Towards Flexible Multi-modal Document ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Naoto Inoue
Kotaro Kikuchi
E. Simo-Serra
Mayu Otani
Kota Yamaguchi
262
36
0
31 Mar 2023
WebUI: A Dataset for Enhancing Visual UI Understanding with Web
  Semantics
WebUI: A Dataset for Enhancing Visual UI Understanding with Web SemanticsInternational Conference on Human Factors in Computing Systems (CHI), 2023
Jason Wu
Siyan Wang
Siman Shen
Yi-Hao Peng
Jeffrey Nichols
Jeffrey P. Bigham
279
92
0
30 Jan 2023
Lexi: Self-Supervised Learning of the UI Language
Lexi: Self-Supervised Learning of the UI LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
203
17
0
23 Jan 2023
Screen Correspondence: Mapping Interchangeable Elements between UIs
Screen Correspondence: Mapping Interchangeable Elements between UIs
Jason Wu
Amanda Swearngin
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
322
9
0
20 Jan 2023
UGIF: UI Grounded Instruction Following
UGIF: UI Grounded Instruction Following
S. Venkatesh
Partha P. Talukdar
S. Narayanan
352
22
0
14 Nov 2022
Understanding HTML with Large Language Models
Understanding HTML with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Izzeddin Gur
Ofir Nachum
Yingjie Miao
Mustafa Safdari
Austin Huang
Aakanksha Chowdhery
Sharan Narang
Noah Fiedel
Aleksandra Faust
AI4CE
595
88
0
08 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingInternational Conference on Machine Learning (ICML), 2022
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIPVLM
909
406
0
07 Oct 2022
Towards Better Semantic Understanding of Mobile Interfaces
Towards Better Semantic Understanding of Mobile InterfacesInternational Conference on Computational Linguistics (COLING), 2022
Srinivas Sunkara
Maria Wang
Lijuan Liu
Gilles Baechler
Yu-Chung Hsiao
Jindong Chen
Chen
Abhanshu Sharma
James Stout
379
35
0
06 Oct 2022
MUG: Interactive Multimodal Grounding on User Interfaces
MUG: Interactive Multimodal Grounding on User InterfacesFindings (Findings), 2022
Tao Li
Gang Li
Jingjie Zheng
Purple Wang
Yang Li
LLMAG
203
11
0
29 Sep 2022
Spotlight: Mobile UI Understanding using Vision-Language Models with a
  Focus
Spotlight: Mobile UI Understanding using Vision-Language Models with a FocusInternational Conference on Learning Representations (ICLR), 2022
Gang Li
Yang Li
469
87
0
29 Sep 2022
Extracting Replayable Interactions from Videos of Mobile App Usage
Extracting Replayable Interactions from Videos of Mobile App Usage
Jieshan Chen
Amanda Swearngin
Jason Wu
Titus Barik
Jeffrey Nichols
Xiaoyi Zhang
207
11
0
09 Jul 2022
DETR++: Taming Your Multi-Scale Detection Transformer
DETR++: Taming Your Multi-Scale Detection Transformer
Chi Zhang
Lijuan Liu
Xiaoxue Zang
Frederick Liu
Hao Zhang
Xi-gang Song
Jin-Duan Chen
ViT
164
8
0
07 Jun 2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUIConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Liangtai Sun
Xingyu Chen
Lu Chen
Tianle Dai
Zichen Zhu
Kai Yu
LLMAG
318
90
0
23 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
177
35
0
13 May 2022
GreaseVision: Rewriting the Rules of the Interface
GreaseVision: Rewriting the Rules of the Interface
Siddhartha Datta
Konrad Kollnig
N. Shadbolt
269
6
0
07 Apr 2022
Predicting and Explaining Mobile UI Tappability with Vision Modeling and
  Saliency Analysis
Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency AnalysisInternational Conference on Human Factors in Computing Systems (CHI), 2022
E. Schoop
Xin Zhou
Gang Li
Zhourong Chen
Björn Hartmann
Yang Li
HAIFAtt
270
41
0
05 Apr 2022
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks
  with Unified Vision-and-Language BERTs
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs
Taichi Iki
Akiko Aizawa
LLMAG
219
6
0
15 Mar 2022
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface
  Modeling
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
Yang Li
Gang Li
Xin Zhou
Mostafa Dehghani
A. Gritsenko
MLLM
213
40
0
10 Dec 2021
UIBert: Learning Generic Multimodal Representations for UI Understanding
UIBert: Learning Generic Multimodal Representations for UI UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2021
Chongyang Bai
Xiaoxue Zang
Ying Xu
Srinivas Sunkara
Abhinav Rastogi
Jindong Chen
Blaise Agüera y Arcas
300
119
0
29 Jul 2021
Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Jingwen Fu
Xiaoyi Zhang
Yuwang Wang
Wenjun Zeng
Sam Yang
Grayson Hilliard
301
18
0
25 May 2021
1
Page 1 of 1