Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2503.02304
Cited By
v1
v2 (latest)
A Token-level Text Image Foundation Model for Document Understanding
4 March 2025
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
Kai Zhou
Tiezhu Yue
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Yunbo Wang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"A Token-level Text Image Foundation Model for Document Understanding"
45 / 95 papers shown
Title
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
International Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
424
2,657
0
20 Apr 2023
Segment Anything
IEEE International Conference on Computer Vision (ICCV), 2023
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
858
10,796
0
05 Apr 2023
Sigmoid Loss for Language Image Pre-Training
IEEE International Conference on Computer Vision (ICCV), 2023
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
1.4K
2,137
0
27 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
European Conference on Computer Vision (ECCV), 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
738
3,176
0
09 Mar 2023
Turning a CLIP Model into a Scene Text Detector
Computer Vision and Pattern Recognition (CVPR), 2023
Wenwen Yu
Yuliang Liu
Wei Hua
Deqiang Jiang
Bo Ren
Xiang Bai
VLM
CLIP
MLLM
238
80
0
28 Feb 2023
Self-supervised Character-to-Character Distillation for Text Recognition
IEEE International Conference on Computer Vision (ICCV), 2022
Tongkun Guan
Wei Shen
Xuehang Yang
Qi Feng
Zekun Jiang
Xiaokang Yang
384
31
0
01 Nov 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
International Conference on Learning Representations (ICLR), 2022
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
478
380
0
29 Sep 2022
A Survey on Label-efficient Deep Image Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Wei Shen
Zelin Peng
Xuehui Wang
Huayu Wang
Jiazhong Cen
Dongsheng Jiang
Lingxi Xie
Yunbo Wang
Qi Tian
VLM
327
113
0
04 Jul 2022
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Computer Vision and Pattern Recognition (CVPR), 2022
Shangbang Long
Siyang Qin
Dmitry Panteleev
Alessandro Bissacco
Yasuhisa Fujii
Michalis Raptis
256
112
0
28 Mar 2022
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Findings (Findings), 2022
Ahmed Masry
Do Xuan Long
J. Tan
Shafiq Joty
Enamul Hoque
AIMat
387
1,086
0
19 Mar 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Neural Information Processing Systems (NeurIPS), 2022
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
354
130
0
14 Feb 2022
OCR-free Document Understanding Transformer
Geewook Kim
Teakgyu Hong
Moonbin Yim
Jeongyeon Nam
Jinyoung Park
Jinyeong Yim
Wonseok Hwang
Sangdoo Yun
Dongyoon Han
Seunghyun Park
ViT
518
274
0
30 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
746
1,692
0
03 Nov 2021
Industrial Scene Text Detection with Refined Feature-attentive Network
Tongkun Guan
Chaochen Gu
Changsheng Lu
Jingzheng Tu
Qi Feng
Kaijie Wu
Xinping Guan
183
39
0
25 Oct 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
ACM Symposium on User Interface Software and Technology (UIST), 2021
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
697
191
0
07 Aug 2021
Open Images V5 Text Annotation and Yet Another Mask Text Spotter
Ilya Krylov
S. Nosov
V. Sovrasov
VLM
178
63
0
23 Jun 2021
Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2021
Tomasz Stanislawek
Filip Graliñski
Anna Wróblewska
Dawid Lipiñski
Agnieszka Kaliska
Paulina Rosalska
Bartosz Topolski
P. Biecek
189
111
0
12 May 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Computer Vision and Pattern Recognition (CVPR), 2021
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
235
212
0
12 May 2021
Emerging Properties in Self-Supervised Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Mathilde Caron
Hugo Touvron
Ishan Misra
Edouard Grave
Julien Mairal
Piotr Bojanowski
Armand Joulin
1.9K
7,754
0
29 Apr 2021
InfographicVQA
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
325
359
0
26 Apr 2021
Scene Text Retrieval via Joint Text Detection and Similarity Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Hao Wang
X. Bai
Mingkun Yang
Shenggao Zhu
Jing Wang
Wenyu Liu
3DV
104
41
0
04 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
2.0K
40,340
0
26 Feb 2021
VisualMRC: Machine Reading Comprehension on Document Images
AAAI Conference on Artificial Intelligence (AAAI), 2021
Ryota Tanaka
Kyosuke Nishida
Sen Yoshida
254
185
0
27 Jan 2021
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
634
1,072
0
01 Jul 2020
TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation
Ningyuan Sun
Xuefeng Yang
Yunfeng Liu
LMTD
172
39
0
10 Jun 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
European Conference on Computer Vision (ECCV), 2020
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
347
503
0
24 Mar 2020
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Computer Vision and Pattern Recognition (CVPR), 2020
Yuliang Liu
Hao Chen
Chunhua Shen
Tong He
Lianwen Jin
Liangwei Wang
307
380
0
24 Feb 2020
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Knowledge Discovery and Data Mining (KDD), 2019
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
365
860
0
31 Dec 2019
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019
Xi Liu
Rui Zhang
Yongsheng Zhou
Qianyi Jiang
Qi Song
...
X. Bai
Baoguang Shi
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
3DV
178
184
0
20 Dec 2019
Image-based table recognition: data, model, and evaluation
European Conference on Computer Vision (ECCV), 2019
Xu Zhong
Elaheh Shafieibavani
Antonio Jimeno Yepes
LMTD
317
276
0
25 Nov 2019
ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019
Yipeng Sun
Zihan Ni
Chee-Kheng Chng
Yuliang Liu
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
246
182
0
17 Sep 2019
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019
Chee-Kheng Chng
Yuliang Liu
Yipeng Sun
Chun Chet Ng
Canjie Luo
...
Errui Ding
Jingtuo Liu
Dimosthenis Karatzas
Chee Seng Chan
Lianwen Jin
3DV
210
247
0
16 Sep 2019
TabFact: A Large-scale Dataset for Table-based Fact Verification
International Conference on Learning Representations (ICLR), 2019
Wenhu Chen
Hongmin Wang
Jianshu Chen
Yunkai Zhang
Hong Wang
Shiyang Li
Xiyou Zhou
William Yang Wang
LMTD
468
609
0
05 Sep 2019
PlotQA: Reasoning over Scientific Plots
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
207
328
0
03 Sep 2019
PubLayNet: largest dataset ever for document layout analysis
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019
Xu Zhong
Jianbin Tang
Antonio Jimeno Yepes
159
539
0
16 Aug 2019
Scene Text Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2019
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
404
439
0
31 May 2019
DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle
Brian L. Price
Scott D. Cohen
Christopher Kanan
AIMat
314
469
0
24 Jan 2018
Detecting Curve Text in the Wild: New Dataset and New Solution
Liu Yuliang
Jin Lianwen
Shuaitao Zhang
Sheng Zhang
181
286
0
06 Dec 2017
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2017
Chee-Kheng Chng
Chee Seng Chan
272
511
0
28 Oct 2017
FigureQA: An Annotated Figure Dataset for Visual Reasoning
Samira Ebrahimi Kahou
Vincent Michalski
Adam Atkinson
Ákos Kádár
Adam Trischler
Yoshua Bengio
ReLM
AIMat
223
398
0
19 Oct 2017
Focusing Attention: Towards Accurate Text Recognition in Natural Images
Zhanzhan Cheng
Fan Bai
Yunlu Xu
Gang Zheng
Shiliang Pu
Shuigeng Zhou
238
474
0
07 Sep 2017
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Baoguang Shi
Cong Yao
Minghui Liao
Mingkun Yang
Pei Xu
Linyan Cui
Serge J. Belongie
Shijian Lu
X. Bai
286
238
0
31 Aug 2017
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
609
578
0
26 Jan 2016
Compositional Semantic Parsing on Semi-Structured Tables
Annual Meeting of the Association for Computational Linguistics (ACL), 2015
Panupong Pasupat
Abigail Z. Jacobs
CoGe
LMTD
293
905
0
03 Aug 2015
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
Adam W. Harley
Alex Ufkes
Konstantinos G. Derpanis
245
438
0
25 Feb 2015
Previous
1
2