ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.13731
  4. Cited By
UIBert: Learning Generic Multimodal Representations for UI Understanding

UIBert: Learning Generic Multimodal Representations for UI Understanding

29 July 2021
Chongyang Bai
Xiaoxue Zang
Ying Xu
Srinivas Sunkara
Abhinav Rastogi
Jindong Chen
Blaise Agüera y Arcas
ArXivPDFHTML

Papers citing "UIBert: Learning Generic Multimodal Representations for UI Understanding"

17 / 17 papers shown
Title
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions
Yuxuan Liu
Hongda Sun
Wei Liu
Jian Luan
Bo Du
Rui Yan
53
2
0
24 Feb 2025
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
W. Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
68
13
0
07 Nov 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
76
48
0
07 Oct 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin Qinghong Lin
Linjie Li
Difei Gao
Qinchen Wu
Mingyi Yan
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
41
10
0
14 Jun 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Tur[k]ingBench: A Challenge Benchmark for Web Agents
Kevin Xu
Yeganeh Kordi
Kate Sanders
Yizhong Wang
Adam Byerly
Kate Sanders
Adam Byerly
Jingyu Zhang
Benjamin Van Durme
Daniel Khashabi
LLMAG
67
6
0
18 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
62
12
0
05 Mar 2024
AI Assistance for UX: A Literature Review Through Human-Centered AI
AI Assistance for UX: A Literature Review Through Human-Centered AI
Yuwen Lu
Yuewen Yang
Qinyi Zhao
Chengzhi Zhang
Toby Jia-Jun Li
11
16
0
08 Feb 2024
EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with
  Multimodal Learning
EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning
Liuqing Chen
Yunnong Chen
Shuhong Xiao
Yaxuan Song
Lingyun Sun
Yankun Zhen
Tingting Zhou
Yan-fang Chang
41
4
0
18 Sep 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
Android in the Wild: A Large-Scale Dataset for Android Device Control
Christopher Rawles
Alice Li
Daniel Rodriguez
Oriana Riva
Timothy Lillicrap
LM&Ro
26
137
0
19 Jul 2023
UGIF: UI Grounded Instruction Following
UGIF: UI Grounded Instruction Following
S. Venkatesh
Partha P. Talukdar
S. Narayanan
16
10
0
14 Nov 2022
MUG: Interactive Multimodal Grounding on User Interfaces
MUG: Interactive Multimodal Grounding on User Interfaces
Tao Li
Gang Li
Jingjie Zheng
Purple Wang
Yang Li
LLMAG
33
8
0
29 Sep 2022
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Yu-Chung Hsiao
Fedir Zubach
Maria Wang
Jindong Chen
Victor Carbune
Jason Lin
Maria Wang
Yun Zhu
Jindong Chen
RALM
152
25
0
16 Sep 2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun
Xingyu Chen
Lu Chen
Tianle Dai
Zichen Zhu
Kai Yu
LLMAG
18
50
0
23 May 2022
Predicting and Explaining Mobile UI Tappability with Vision Modeling and
  Saliency Analysis
Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis
E. Schoop
Xin Zhou
Gang Li
Zhourong Chen
Björn Hartmann
Yang Li
HAI
FAtt
29
32
0
05 Apr 2022
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at
  Scale
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale
Gang Li
Gilles Baechler
Manuel Tragut
Yang Li
11
49
0
11 Jan 2022
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface
  Modeling
VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
Yang Li
Gang Li
Xin Zhou
Mostafa Dehghani
A. Gritsenko
MLLM
25
34
0
10 Dec 2021
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1