Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.00923
Cited By
v1
v2 (latest)
Multimodal grid features and cell pointers for Scene Text Visual Question Answering
1 June 2020
Lluís Gómez
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Marçal Rusiñol
Ernest Valveny
Dimosthenis Karatzas
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multimodal grid features and cell pointers for Scene Text Visual Question Answering"
5 / 5 papers shown
Title
Hierarchical multimodal transformers for Multi-Page DocVQA
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
94
61
0
07 Dec 2022
MUST-VQA: MUltilingual Scene-text VQA
Emanuele Vivoli
Ali Furkan Biten
Andrés Mafla
Dimosthenis Karatzas
Lluís Gómez
113
6
0
14 Sep 2022
OCR-IDL: OCR Annotations for Industry Document Library Dataset
Ali Furkan Biten
Rubèn Pérez Tito
Lluís Gómez
Ernest Valveny
Dimosthenis Karatzas
77
30
0
25 Feb 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
127
102
0
23 Dec 2021
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
172
748
0
01 Jul 2020
1