ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.03347
  4. Cited By
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

7 October 2022
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding"

8 / 8 papers shown
Title
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
...
David Vazquez
Christopher Pal
Perouz Taslakian
Spandana Gella
Sai Rajeswar
57
0
0
19 Mar 2025
WebFormer: The Web-page Transformer for Structure Information Extraction
WebFormer: The Web-page Transformer for Structure Information Extraction
Qifan Wang
Yi Fang
Anirudh Ravula
Fuli Feng
Xiaojun Quan
Dongfang Liu
ViT
115
53
0
01 Feb 2022
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
223
280
0
22 Sep 2021
Screen Parsing: Towards Reverse Engineering of UI Models from
  Screenshots
Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots
Jason Wu
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
3DV
140
56
0
17 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
224
476
0
27 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
145
97
0
07 Aug 2021
Screen Recognition: Creating Accessibility Metadata for Mobile
  Applications from Pixels
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
Xiaoyi Zhang
Lilian de Greef
Amanda Swearngin
Samuel White
Kyle I. Murray
...
Jeffrey Nichols
Jason Wu
Chris Fleizach
Aaron Everitt
Jeffrey P. Bigham
136
130
0
13 Jan 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
120
393
0
29 Dec 2020
1