Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

8 October 2020

Papers citing "Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements"

28 / 78 papers shown

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Zhiyong Wu

703

356

17 Jan 2024

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Jialin Wu

...

312

139

13 Oct 2023

Automatic Macro Mining from Interaction Traces at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2023

255

10 Oct 2023

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

211

07 Oct 2023

AutoDroid: LLM-powered Task Automation in AndroidACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2023

431

183

29 Aug 2023

Never-ending Learning of User InterfacesACM Symposium on User Interface Software and Technology (UIST), 2023

159

17 Aug 2023

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023

279

31 May 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model

...

Mojtaba Seyedhosseini

350

254

29 May 2023

Alt-Text with Context: Improving Accessibility for Images on TwitterInternational Conference on Learning Representations (ICLR), 2023

Nikita Srivatsan

Sofia Samaniego

Omar U. Florez

Taylor Berg-Kirkpatrick

200

24 May 2023

DUBLIN -- Document Understanding By Language-Image Network

Hardik Hansrajbhai Chauhan

332

23 May 2023

MenuCraft: Interactive Menu System Design with Large Language Models

Amir Hossein Kargaran

300

08 Mar 2023

Lexi: Self-Supervised Learning of the UI LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

147

23 Jan 2023

Screen Correspondence: Mapping Interchangeable Elements between UIs

243

20 Jan 2023

UGIF: UI Grounded Instruction Following

S. Venkatesh

Partha P. Talukdar

S. Narayanan

311

14 Nov 2022

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingInternational Conference on Machine Learning (ICML), 2022

Julian Martin Eisenschlos

826

374

07 Oct 2022

Towards Better Semantic Understanding of Mobile InterfacesInternational Conference on Computational Linguistics (COLING), 2022

246

06 Oct 2022

MUG: Interactive Multimodal Grounding on User InterfacesFindings (Findings), 2022

186

29 Sep 2022

Spotlight: Mobile UI Understanding using Vision-Language Models with a FocusInternational Conference on Learning Representations (ICLR), 2022

Gang Li

Yang Li

361

29 Sep 2022

Enabling Conversational Interaction with Mobile UI using Large Language ModelsInternational Conference on Human Factors in Computing Systems (CHI), 2022

Bryan Wang

Gang Li

Yang Li

430

175

18 Sep 2022

ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Victor Carbune

Jason Lin

Maria Wang

Yun Zhu

Jindong Chen

RALM

987

16 Sep 2022

PreSTU: Pre-Training for Scene-Text UnderstandingIEEE International Conference on Computer Vision (ICCV), 2022

Wei-Lun Chao

350

12 Sep 2022

Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2022

291

11 Jan 2022

VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling

182

10 Dec 2021

Screen2Words: Automatic Mobile UI Summarization with Multimodal LearningACM Symposium on User Interface Software and Technology (UIST), 2021

850

198

07 Aug 2021

UIBert: Learning Generic Multimodal Representations for UI UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2021

Blaise Agüera y Arcas

274

113

29 Jul 2021

Learnable Fourier Features for Multi-Dimensional Spatial Positional EncodingNeural Information Processing Systems (NeurIPS), 2021

405

128

05 Jun 2021

AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

Zhiming Hu

200

31 May 2021

Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsInternational Conference on Human Factors in Computing Systems (CHI), 2021

256

118

11 Jan 2021