ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.04295
  4. Cited By
Widget Captioning: Generating Natural Language Description for Mobile
  User Interface Elements

Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
8 October 2020
Yongqian Li
Gang Li
Luheng He
Jingjie Zheng
Hong Li
Zhiwei Guan
ArXiv (abs)PDFHTML

Papers citing "Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements"

28 / 78 papers shown
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
703
356
0
17 Jan 2024
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLMVLM
312
139
0
13 Oct 2023
Automatic Macro Mining from Interaction Traces at Scale
Automatic Macro Mining from Interaction Traces at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2023
Forrest Huang
Gang Li
Tao Li
Yang Li
255
16
0
10 Oct 2023
Reinforced UI Instruction Grounding: Towards a Generic UI Task
  Automation API
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Zhizheng Zhang
Wenxuan Xie
Xiaoyi Zhang
Yan Lu
211
17
0
07 Oct 2023
AutoDroid: LLM-powered Task Automation in Android
AutoDroid: LLM-powered Task Automation in AndroidACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2023
Hao Wen
Yuanchun Li
Guohong Liu
Shanhui Zhao
Tao Yu
Toby Jia-Jun Li
Shiqi Jiang
Yunhao Liu
Yaqin Zhang
Yunxin Liu
431
183
0
29 Aug 2023
Never-ending Learning of User Interfaces
Never-ending Learning of User InterfacesACM Symposium on User Interface Software and Technology (UIST), 2023
Jason Wu
Rebecca Krosnick
E. Schoop
Amanda Swearngin
Jeffrey P. Bigham
Jeffrey Nichols
VLMHAI
159
23
0
17 Aug 2023
From Pixels to UI Actions: Learning to Follow Instructions via Graphical
  User Interfaces
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023
Peter Shaw
Mandar Joshi
James Cohan
Jonathan Berant
Panupong Pasupat
Hexiang Hu
Urvashi Khandelwal
Kenton Lee
Kristina Toutanova
LLMAGLM&Ro
279
76
0
31 May 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
350
254
0
29 May 2023
Alt-Text with Context: Improving Accessibility for Images on Twitter
Alt-Text with Context: Improving Accessibility for Images on TwitterInternational Conference on Learning Representations (ICLR), 2023
Nikita Srivatsan
Sofia Samaniego
Omar U. Florez
Taylor Berg-Kirkpatrick
200
7
0
24 May 2023
DUBLIN -- Document Understanding By Language-Image Network
DUBLIN -- Document Understanding By Language-Image Network
Kriti Aggarwal
Aditi Khandelwal
Kumar Tanmay
Owais Mohammed Khan
Qiang Liu
Monojit Choudhury
Hardik Hansrajbhai Chauhan
Subhojit Som
Vishrav Chaudhary
Saurabh Tiwary
ObjDVLM
332
0
0
23 May 2023
MenuCraft: Interactive Menu System Design with Large Language Models
MenuCraft: Interactive Menu System Design with Large Language Models
Amir Hossein Kargaran
Nafiseh Nikeghbal
Abbas Heydarnoori
Hinrich Schütze
LLMAG
300
6
0
08 Mar 2023
Lexi: Self-Supervised Learning of the UI Language
Lexi: Self-Supervised Learning of the UI LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Pratyay Banerjee
Shweti Mahajan
Kushal Arora
Chitta Baral
Oriana Riva
147
18
0
23 Jan 2023
Screen Correspondence: Mapping Interchangeable Elements between UIs
Screen Correspondence: Mapping Interchangeable Elements between UIs
Jason Wu
Amanda Swearngin
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
243
9
0
20 Jan 2023
UGIF: UI Grounded Instruction Following
UGIF: UI Grounded Instruction Following
S. Venkatesh
Partha P. Talukdar
S. Narayanan
311
20
0
14 Nov 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingInternational Conference on Machine Learning (ICML), 2022
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIPVLM
826
374
0
07 Oct 2022
Towards Better Semantic Understanding of Mobile Interfaces
Towards Better Semantic Understanding of Mobile InterfacesInternational Conference on Computational Linguistics (COLING), 2022
Srinivas Sunkara
Maria Wang
Lijuan Liu
Gilles Baechler
Yu-Chung Hsiao
Jindong Chen
Chen
Abhanshu Sharma
James Stout
246
33
0
06 Oct 2022
MUG: Interactive Multimodal Grounding on User Interfaces
MUG: Interactive Multimodal Grounding on User InterfacesFindings (Findings), 2022
Tao Li
Gang Li
Jingjie Zheng
Purple Wang
Yang Li
LLMAG
186
11
0
29 Sep 2022
Spotlight: Mobile UI Understanding using Vision-Language Models with a
  Focus
Spotlight: Mobile UI Understanding using Vision-Language Models with a FocusInternational Conference on Learning Representations (ICLR), 2022
Gang Li
Yang Li
361
82
0
29 Sep 2022
Enabling Conversational Interaction with Mobile UI using Large Language
  Models
Enabling Conversational Interaction with Mobile UI using Large Language ModelsInternational Conference on Human Factors in Computing Systems (CHI), 2022
Bryan Wang
Gang Li
Yang Li
430
175
0
18 Sep 2022
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Yu-Chung Hsiao
Fedir Zubach
Maria Wang
Jindong Chen
Victor Carbune
Jason Lin
Maria Wang
Yun Zhu
Jindong Chen
RALM
987
48
0
16 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
PreSTU: Pre-Training for Scene-Text UnderstandingIEEE International Conference on Computer Vision (ICCV), 2022
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
350
38
0
12 Sep 2022
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at
  Scale
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2022
Gang Li
Gilles Baechler
Manuel Tragut
Yang Li
291
56
0
11 Jan 2022
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface
  Modeling
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
Yang Li
Gang Li
Xin Zhou
Mostafa Dehghani
A. Gritsenko
MLLM
182
40
0
10 Dec 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Screen2Words: Automatic Mobile UI Summarization with Multimodal LearningACM Symposium on User Interface Software and Technology (UIST), 2021
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
850
198
0
07 Aug 2021
UIBert: Learning Generic Multimodal Representations for UI Understanding
UIBert: Learning Generic Multimodal Representations for UI UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2021
Chongyang Bai
Xiaoxue Zang
Ying Xu
Srinivas Sunkara
Abhinav Rastogi
Jindong Chen
Blaise Agüera y Arcas
274
113
0
29 Jul 2021
Learnable Fourier Features for Multi-Dimensional Spatial Positional
  Encoding
Learnable Fourier Features for Multi-Dimensional Spatial Positional EncodingNeural Information Processing Systems (NeurIPS), 2021
Yang Li
Si Si
Gang Li
Cho-Jui Hsieh
Samy Bengio
405
128
0
05 Jun 2021
AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement
  Learning
AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning
Maayan Shvo
Zhiming Hu
Rodrigo Toro Icarte
Iqbal Mohomed
A. Jepson
Sheila A. McIlraith
200
16
0
31 May 2021
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsInternational Conference on Human Factors in Computing Systems (CHI), 2021
Toby Jia-Jun Li
Lindsay Popowski
Tom Michael Mitchell
Brad A. Myers
256
118
0
11 Jan 2021
Previous
12
Page 2 of 2