SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Automatic Macro Mining from Interaction Traces at ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2023 |
AutoDroid: LLM-powered Task Automation in AndroidACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2023 |
Never-ending Learning of User InterfacesACM Symposium on User Interface Software and Technology (UIST), 2023 |
From Pixels to UI Actions: Learning to Follow Instructions via Graphical
User InterfacesNeural Information Processing Systems (NeurIPS), 2023 |
Alt-Text with Context: Improving Accessibility for Images on TwitterInternational Conference on Learning Representations (ICLR), 2023 |
Lexi: Self-Supervised Learning of the UI LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
UnderstandingInternational Conference on Machine Learning (ICML), 2022 |
Towards Better Semantic Understanding of Mobile InterfacesInternational Conference on Computational Linguistics (COLING), 2022 |
MUG: Interactive Multimodal Grounding on User InterfacesFindings (Findings), 2022 |
Spotlight: Mobile UI Understanding using Vision-Language Models with a
FocusInternational Conference on Learning Representations (ICLR), 2022 |
Enabling Conversational Interaction with Mobile UI using Large Language
ModelsInternational Conference on Human Factors in Computing Systems (CHI), 2022 |
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022 Yu-Chung Hsiao Fedir Zubach Maria Wang Jindong Chen Victor Carbune Jason Lin Maria Wang Yun Zhu Jindong Chen |
PreSTU: Pre-Training for Scene-Text UnderstandingIEEE International Conference on Computer Vision (ICCV), 2022 |
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at
ScaleInternational Conference on Human Factors in Computing Systems (CHI), 2022 |
Screen2Words: Automatic Mobile UI Summarization with Multimodal LearningACM Symposium on User Interface Software and Technology (UIST), 2021 |
UIBert: Learning Generic Multimodal Representations for UI UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2021 |
Learnable Fourier Features for Multi-Dimensional Spatial Positional
EncodingNeural Information Processing Systems (NeurIPS), 2021 |
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsInternational Conference on Human Factors in Computing Systems (CHI), 2021 |